five

Llama3 中文数据集

收藏
魔搭社区2026-05-16 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/zhuangxialie/Llama3-Chinese-Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
## 中文微调数据集 ### 已全转为ShareGPT格式(直接使用要删除README、dataset_infos.json) ### firefly-train-1.1M - 包含了23种常见的中文NLP任务的数据,并且构造了许多与中华文化相关的数据,如对联、作诗、文言文翻译、散文、金庸小说等。对于每个任务,由人工书写若干种指令模板,保证数据的高质量与丰富度,数据量为115万。 ### CodeChat - 主要包含逻辑推理、代码问答、代码生成相关语料样本。 ### shareAIShareGPT-Chinese-English-90k - 中英文平行双语优质人机问答数据集,覆盖真实复杂场景下的用户提问。(包含大量多轮对话) ### ruozhiba - 弱智吧数据问答,据说比较锻炼模型的心智能力。 ### 附带Python脚本,可统一转为ShareGPT格式 ### 原版未处理格式汇总 - https://modelscope.cn/datasets/zhuangxialie/SFT-Chinese-Dataset/summary #### 下载方法 :modelscope-code[]{type="sdk"} :modelscope-code[]{type="git"}

## Chinese Fine-tuning Dataset ### All data has been converted to ShareGPT format (delete README and dataset_infos.json before direct usage) ### firefly-train-1.1M - Contains datasets for 23 common Chinese natural language processing (NLP) tasks, as well as numerous samples related to Chinese culture, such as couplets, poem composition, classical Chinese translation, prose, Jin Yong's novels, etc. For each task, several manually written instruction templates are adopted to ensure high data quality and richness, with a total of 1.15 million data entries. ### CodeChat - Mainly contains corpus samples related to logical reasoning, code question answering, and code generation. ### shareAIShareGPT-Chinese-English-90k - A high-quality parallel bilingual Chinese-English human-machine question answering dataset covering user queries in real and complex scenarios, including a large number of multi-turn dialogues. ### ruozhiba - Question answering data from Ruozhiba Bar, which is reported to help improve the mental reasoning ability of AI models. ### Accompanying Python script for unified conversion to ShareGPT format ### Unprocessed Original Format Summary - https://modelscope.cn/datasets/zhuangxialie/SFT-Chinese-Dataset/summary #### Download Methods - :modelscope-code[]{type="sdk"} - :modelscope-code[]{type="git"}
提供机构:
maas
创建时间:
2024-04-24
搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务