大模型多轮对话SFT数据集_中文

Name: 大模型多轮对话SFT数据集_中文
Creator: 始智AI
Published: 2024-05-31 14:10:26
License: 暂无描述

始智AI2024-05-31 更新2024-06-01 收录

下载链接：

https://wisemodel.cn/datasets/MagicData/MAGICDATA-CLAM-CONVERSATION_CN

下载链接

链接失效反馈

官方服务：

更多采购需求

资源简介：

为了能够让大家更好的理解我们的多轮对话数据集，我们选取了十万轮作为本次开源的“大模型多轮对话SFT数据集_中文”，其来源于晴数智慧LLM多领域超自然SFT多轮对话文本数据集。本次开源的部分数据，由来自中国的644名不同ID的采集人独家贡献，北京晴数智慧科技有限公司进行授权采集。每组对话由两位采集人围绕一个主题展开，上下文对话与当前的内容逻辑相关。适用于训练大模型多轮对话 (back and forth conversation)、上下文逻辑推理能力，以及端到端对话大模型。

To help researchers and users better understand our multi-turn dialogue dataset, we selected 100,000 dialogue rounds as the open-sourced "Large Model Multi-turn Dialogue SFT Dataset_Chinese" released in this open-source initiative, which is derived from the Qingshu Wisdom LLM Multi-domain Supernatural SFT Multi-turn Dialogue Text Dataset. The partial data included herein is exclusively contributed by 644 Chinese data collectors with unique identifiers, and the collection was authorized by Beijing Qingshu Wisdom Technology Co., Ltd. Each dialogue session is developed by two collectors around a designated topic, with the contextual dialogue maintaining logical coherence with the current utterance. This dataset is suitable for training multi-turn dialogue (back and forth conversation) capabilities, contextual logical reasoning skills, as well as end-to-end conversational large language models.

提供机构：

始智AI

创建时间：

2024-05-31

搜集汇总

数据集介绍