AtAndDev/ultramix-v2
收藏Hugging Face2026-03-03 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/AtAndDev/ultramix-v2
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 6071843774
num_examples: 1658760
download_size: 6013546656
dataset_size: 6071843774
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
```
argilla/magpie-ultra-v1.0: 1,920,000 turns (38.54%)
MegaScience/MegaScience: 1,700,000 turns (34.12%)
enPurified/ultrachat_200k_sft-enPurified-openai-messages: 375,590 turns (7.54%)
Magpie-Align/Magpie-Reasoning-V1-150K: 300,000 turns (6.02%)
OpenLeecher/lmsys_chat_1m_clean: 200,000 turns (4.01%)
efficientscaling/Z1-Code-Reasoning-107K: 200,000 turns (4.01%)
enPurified/Hermes-3-Dataset-enPurified-openai-messages: 169,922 turns (3.41%)
enPurified/smoltalk-creative-writing-enPurified-openai-messages: 99,000 turns (1.99%)
HuggingFaceTB/everyday-conversations-llama3.1-2k: 17,505 turns (0.35%)
============================================================
Total turns: 4,982,017
```
数据集信息:
特征字段:
- 名称:消息(messages)
子字段:
- 名称:内容(content)
数据类型:字符串
- 名称:角色(role)
数据类型:字符串
- 名称:来源(source)
数据类型:字符串
数据集划分:
- 划分名称:训练集(train)
字节数:6071843774
样本数:1658760
下载大小:6013546656
数据集存储大小:6071843774
配置项:
- 配置名称:default
数据文件:
- 划分集:训练集(train)
路径:data/train-*
---
argilla/magpie-ultra-v1.0:1920000条对话轮次(占比38.54%)
MegaScience/MegaScience:1700000条对话轮次(占比34.12%)
enPurified/ultrachat_200k_sft-enPurified-openai-messages:375590条对话轮次(占比7.54%)
Magpie-Align/Magpie-Reasoning-V1-150K:300000条对话轮次(占比6.02%)
OpenLeecher/lmsys_chat_1m_clean:200000条对话轮次(占比4.01%)
efficientscaling/Z1-Code-Reasoning-107K:200000条对话轮次(占比4.01%)
enPurified/Hermes-3-Dataset-enPurified-openai-messages:169922条对话轮次(占比3.41%)
enPurified/smoltalk-creative-writing-enPurified-openai-messages:99000条对话轮次(占比1.99%)
HuggingFaceTB/everyday-conversations-llama3.1-2k:17505条对话轮次(占比0.35%)
============================================================
总对话轮次:4982017
提供机构:
AtAndDev



