thomas-yanxin/MT-SFT-ShareGPT
收藏Hugging Face2024-08-18 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/thomas-yanxin/MT-SFT-ShareGPT
下载链接
链接失效反馈官方服务:
资源简介:
MT-SFT-ShareGPT数据集是一个用于大语言模型指令微调的高质量数据集,包含英语、中文和其他语言的数据,总数据量为5,563,444条。数据集分为13个子类别,涵盖了信息检索、推理、规划、编辑、编码、数学、角色扮演、数据分析、创意写作、建议寻求、头脑风暴、翻译等多个任务。数据处理使用了多个模型进行评分、分类和质量控制,确保数据的安全性和高质量。数据集格式符合ShareGPT规范,适用于训练大语言模型。
The MT-SFT-ShareGPT dataset is a high-quality dataset for fine-tuning instructions of large language models, containing data in English, Chinese, and other languages, with a total data volume of 5,563,444. The dataset is divided into 13 subcategories, covering tasks such as information seeking, reasoning, planning, editing, coding, math, role playing, data analysis, creative writing, advice seeking, brainstorming, translation, and more. Data processing involves multiple models for scoring, classification, and quality control to ensure data safety and high quality. The dataset format adheres to the ShareGPT specification, making it suitable for training large language models.
提供机构:
thomas-yanxin



