晴数智慧高质量大模型多轮对话SFT数据集

Name: 晴数智慧高质量大模型多轮对话SFT数据集
Creator: MagicHub
Published: 2026-06-07 03:30:44
License: 暂无描述

OpenDataLab2026-06-07 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/MagicHub/LLM-SFT-Dataset

下载链接

链接失效反馈

官方服务：

资源简介：

此数据集包含15万轮中文自然对话句子，由来自中国7个省份 (江苏、四川、山东、山西、北京、广东、海南)的663名说话人贡献，其中男性368人，女性295人。每组对话由两名说话人围绕一个主题展开，历史的对话与当前的内容密切相关。适用于训练大模型多轮对话 (back and forth conversation)、上下文逻辑推理能力。

This dataset contains 150,000 rounds of Chinese natural dialogue sentences, contributed by 663 speakers from 7 provinces in China (Jiangsu, Sichuan, Shandong, Shanxi, Beijing, Guangdong, Hainan), including 368 male speakers and 295 female speakers. Each dialogue session involves two speakers conducting a conversation centered on a specific topic, where the historical dialogue context is closely correlated with the current conversation content. This dataset is suitable for training large language models (LLMs) in multi-turn back-and-forth conversation and contextual logical reasoning capabilities.

提供机构：

MagicHub

创建时间：

2023-11-28

搜集汇总

数据集介绍

背景与挑战

背景概述

晴数智慧高质量大模型多轮对话SFT数据集是一个包含97184轮中文自然对话句子的公开数据集，来源于644名采集人围绕15个多样主题（如休闲娱乐、教育医疗等）的真实对话，适用于训练大模型的多轮对话和上下文逻辑推理能力。该数据集具有语料真实、情感丰富、领域覆盖广泛的特点，采用人机协作处理确保高质量，并遵循CC BY-NC-ND 4.0等许可仅限非商业使用。

以上内容由遇见数据集搜集并总结生成