five

AtAndDev/ultramix-v2

收藏
Hugging Face2026-03-03 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/AtAndDev/ultramix-v2
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: messages list: - name: content dtype: string - name: role dtype: string - name: source dtype: string splits: - name: train num_bytes: 6071843774 num_examples: 1658760 download_size: 6013546656 dataset_size: 6071843774 configs: - config_name: default data_files: - split: train path: data/train-* --- ``` argilla/magpie-ultra-v1.0: 1,920,000 turns (38.54%) MegaScience/MegaScience: 1,700,000 turns (34.12%) enPurified/ultrachat_200k_sft-enPurified-openai-messages: 375,590 turns (7.54%) Magpie-Align/Magpie-Reasoning-V1-150K: 300,000 turns (6.02%) OpenLeecher/lmsys_chat_1m_clean: 200,000 turns (4.01%) efficientscaling/Z1-Code-Reasoning-107K: 200,000 turns (4.01%) enPurified/Hermes-3-Dataset-enPurified-openai-messages: 169,922 turns (3.41%) enPurified/smoltalk-creative-writing-enPurified-openai-messages: 99,000 turns (1.99%) HuggingFaceTB/everyday-conversations-llama3.1-2k: 17,505 turns (0.35%) ============================================================ Total turns: 4,982,017 ```

数据集信息: 特征字段: - 名称:消息(messages) 子字段: - 名称:内容(content) 数据类型:字符串 - 名称:角色(role) 数据类型:字符串 - 名称:来源(source) 数据类型:字符串 数据集划分: - 划分名称:训练集(train) 字节数:6071843774 样本数:1658760 下载大小:6013546656 数据集存储大小:6071843774 配置项: - 配置名称:default 数据文件: - 划分集:训练集(train) 路径:data/train-* --- argilla/magpie-ultra-v1.0:1920000条对话轮次(占比38.54%) MegaScience/MegaScience:1700000条对话轮次(占比34.12%) enPurified/ultrachat_200k_sft-enPurified-openai-messages:375590条对话轮次(占比7.54%) Magpie-Align/Magpie-Reasoning-V1-150K:300000条对话轮次(占比6.02%) OpenLeecher/lmsys_chat_1m_clean:200000条对话轮次(占比4.01%) efficientscaling/Z1-Code-Reasoning-107K:200000条对话轮次(占比4.01%) enPurified/Hermes-3-Dataset-enPurified-openai-messages:169922条对话轮次(占比3.41%) enPurified/smoltalk-creative-writing-enPurified-openai-messages:99000条对话轮次(占比1.99%) HuggingFaceTB/everyday-conversations-llama3.1-2k:17505条对话轮次(占比0.35%) ============================================================ 总对话轮次:4982017
提供机构:
AtAndDev
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作