MagicData-CLAM-Conversation_CN
收藏MagicHub开源社区2024-05-27 更新2024-06-08 收录
下载链接:
https://magichub.com/datasets/magicdata-clam-conversation_cn/
下载链接
链接失效反馈官方服务:
资源简介:
This dataset comprises 97184 Chinese natural conversation sentences across 15 topics, including Life at home, Education & Healthcare, Military & War, Science & Technology, Climte & Environment, Humanities, Business & Economy, Digital Devices, Sports, Entertainment, Daily Life, Fine Arts, Politics and Law, Career Development, Religious and Faith. The partially open-source data this time was contributed exclusively by 644 collectors with different IDs from China, and authorized by Beijing Magic Data Technology Co., Ltd. Each group of conversations is carried out by two speakers around a topic, and the context is logically related to the current topic. It is suitable for training large model's back and forth conversation, contextual logical reasoning ability, and end-to-end conversation ability.
本数据集包含97184条中文自然对话语句,涵盖15大主题,分别为:居家生活、教育与医疗、军事与战争、科学与技术、气候与环境、人文、商业与经济、数字设备、体育、娱乐、日常生活、美术、政法、职业发展及宗教与信仰。
本次公开的部分开源数据,仅由来自中国的644名不同ID采集者贡献,并经北京魔数科技有限公司(Beijing Magic Data Technology Co., Ltd.)授权。
每组对话由两名发言者围绕同一主题展开,上下文与当前主题具备逻辑关联性。
本数据集适用于训练大语言模型(Large Language Model, LLM)的多轮对话、上下文逻辑推理及端到端对话能力。
创建时间:
2024-05-27
搜集汇总
数据集介绍

背景与挑战
背景概述
MagicData-CLAM-Conversation_CN是一个中文自然对话数据集,专为大型语言模型微调设计,包含97,184个句子,覆盖15个多样主题,如生活、科技和娱乐,由644名收集者贡献。该数据集以自发语音风格呈现,包含322组逻辑相关的对话,平均每对话302轮,适用于训练模型的上下文对话和逻辑推理能力。
以上内容由遇见数据集搜集并总结生成



