bluelightai-dev/clt-pretrain-data-v3
收藏Hugging Face2026-01-30 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/bluelightai-dev/clt-pretrain-data-v3
下载链接
链接失效反馈官方服务:
资源简介:
# Mixed Dataset Summary
Generated on 2026-01-30 19:59:29 UTC.
- Total samples: 3,000,000
- Train samples: 2,909,999
- Validation samples: 90,001
- Train fraction: 0.97
- Shuffle seed: 9822222
| Source | Dataset ID | Samples |
| --- | --- | ---: |
| dolma-3 | bluelightai-dev/dolma3_mix-150B-1025-sample | 2,500,000 |
| dolmino-3 | bluelightai-dev/dolma3_dolmino_mix-100B-1125-sample | 500,000 |
提供机构:
bluelightai-dev



