bluelightai-dev/clt-mixed-data-tokenized-Qwen3
收藏Hugging Face2025-12-04 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/bluelightai-dev/clt-mixed-data-tokenized-Qwen3
下载链接
链接失效反馈官方服务:
资源简介:
# Mixed Dataset Summary
Generated on 2025-12-03 21:58:19 UTC.
- Total samples: 2,600,000
- Train samples: 2,547,999
- Validation samples: 52,001
- Train fraction: 0.98
- Shuffle seed: 85028
| Source | Dataset ID | Samples |
| --- | --- | ---: |
| pretrain | bluelightai-dev/clt-pretrain-data-dedup-tokenized-Qwen3-1024 | 2,000,000 |
| posttrain | bluelightai-dev/clt_posttrain_data_tokenized | 600,000 |
提供机构:
bluelightai-dev



