DRD: Chinese Diplomatic Rhetoric Dataset for Supervised Fine-Tuning of Large Language Models

Name: DRD: Chinese Diplomatic Rhetoric Dataset for Supervised Fine-Tuning of Large Language Models
Creator: Science Data Bank
Published: 2025-12-19 06:22:39
License: 暂无描述

DataCite Commons2025-12-19 更新2025-04-16 收录

下载链接：

https://www.scidb.cn/detail?dataSetId=c626220792a1446d96b66861e955fed5

下载链接

链接失效反馈

官方服务：

资源简介：

The dataset derived from the routine press conferences held by the spokespersons of China's Ministry of Foreign Affairs between 2000 and 2024. A total of 20,745 Q&A pairs were collected and curated, forming a comprehensive Chinese Diplomatic Rhetoric Dataset (DRD) intended for supervised fine-tuning of large language models. The aim is to provide a specialized dataset on diplomatic dialogue strategies to enhance existing Chinese language models, enabling them to more accurately comprehend and respond to diplomatic discourse within an international context. The paper introduces the N-Jaccard text similarity algorithm, which mitigates the sensitivity to text length and considers the order of words, thereby capturing semantic relationships and logical connections more effectively. Utilizing the GPT-3.5 Turbo large language model, core information was extracted from the data, converted into contextually appropriate topics, and refined to generate input prompts and output responses. Both machine and human reviewers audited and corrected the initial processed data to ensure the dataset's quality. The DRD dataset, based on extensive high-quality Q&A data, effectively reflects the conciseness, accuracy, and artistry of diplomatic language expression.

提供机构：

Science Data Bank

创建时间：

2024-08-05

5,000+

优质数据集

54 个

任务类型

进入经典数据集