five

dolma3_dolmino_mix-10B-1025

收藏
魔搭社区2026-01-07 更新2025-11-29 收录
下载链接:
https://modelscope.cn/datasets/allenai/dolma3_dolmino_mix-10B-1025
下载链接
链接失效反馈
官方服务:
资源简介:
<img alt="Logo for Dolmino Mix" src="dolmino-mix.png" width="289px" style="margin-left:'auto' margin-right:'auto' display:'block'"> # Dolma 3 dolmino dataset mix (10B) for Olmo3 stage 2 annealing training. Smaller mixture of high-quality data. 10B is the size at which we ran all micro anneals for Olmo 3 stage 2 training. For the full mixture, please see: https://huggingface.co/datasets/allenai/dolma3_dolmino_mix-100B-1025 ## Licensing Information Dolma 3 Dolmino is licensed under the Open Data Commons Attribution License v1.0 (ODC-By). It is intended for research and educational use. For more information, please see our [Responsible Use Guidelines](https://allenai.org/responsible-use). ## Citation ``` @misc{olmo2025olmo3, title={Olmo 3}, author={Team Olmo and Allyson Ettinger and Amanda Bertsch and Bailey Kuehl and David Graham and David Heineman and Dirk Groeneveld and Faeze Brahman and Finbarr Timbers and Hamish Ivison and Jacob Morrison and Jake Poznanski and Kyle Lo and Luca Soldaini and Matt Jordan and Mayee Chen and Michael Noukhovitch and Nathan Lambert and Pete Walsh and Pradeep Dasigi and Robert Berry and Saumya Malik and Saurabh Shah and Scott Geng and Shane Arora and Shashank Gupta and Taira Anderson and Teng Xiao and Tyler Murray and Tyler Romero and Victoria Graf and Akari Asai and Akshita Bhagia and Alexander Wettig and Alisa Liu and Aman Rangapur and Chloe Anastasiades and Costa Huang and Dustin Schwenk and Harsh Trivedi and Ian Magnusson and Jaron Lochner and Jiacheng Liu and Lester James V. Miranda and Maarten Sap and Malia Morgan and Michael Schmitz and Michal Guerquin and Michael Wilson and Regan Huff and Ronan Le Bras and Rui Xin and Rulin Shao and Sam Skjonsberg and Shannon Zejiang Shen and Shuyue Stella Li and Tucker Wilde and Valentina Pyatkin and Will Merrill and Yapei Chang and Yuling Gu and Zhiyuan Zeng and Ashish Sabharwal and Luke Zettlemoyer and Pang Wei Koh and Ali Farhadi and Noah A. Smith and Hannaneh Hajishirzi}, year={2025}, eprint={2512.13961}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2512.13961}, } ```

<img alt="Dolmino Mix 数据集标识" src="dolmino-mix.png" width="289px" style="margin-left:'auto' margin-right:'auto' display:'block'"> # 用于Olmo3第二阶段退火训练的Dolma 3 dolmino数据集混合集(10B) 该数据集为高质量数据的精简混合集,10B为我们针对Olmo3第二阶段退火训练开展所有微退火实验所用的数据规模。如需完整混合集,请访问:https://huggingface.co/datasets/allenai/dolma3_dolmino_mix-100B-1025 ## 授权信息 Dolma 3 Dolmino数据集采用开放数据署名许可协议v1.0(Open Data Commons Attribution License v1.0,ODC-By)进行授权。本数据集仅用于研究与教育用途。如需了解更多信息,请参阅我们的[负责任使用指南](https://allenai.org/responsible-use)。 ## 引用 @misc{olmo2025olmo3, title={Olmo 3}, author={Team Olmo and Allyson Ettinger and Amanda Bertsch and Bailey Kuehl and David Graham and David Heineman and Dirk Groeneveld and Faeze Brahman and Finbarr Timbers and Hamish Ivison and Jacob Morrison and Jake Poznanski and Kyle Lo and Luca Soldaini and Matt Jordan and Mayee Chen and Michael Noukhovitch and Nathan Lambert and Pete Walsh and Pradeep Dasigi and Robert Berry and Saumya Malik and Saurabh Shah and Scott Geng and Shane Arora and Shashank Gupta and Taira Anderson and Teng Xiao and Tyler Murray and Tyler Romero and Victoria Graf and Akari Asai and Akshita Bhagia and Alexander Wettig and Alisa Liu and Aman Rangapur and Chloe Anastasiades and Costa Huang and Dustin Schwenk and Harsh Trivedi and Ian Magnusson and Jaron Lochner and Jiacheng Liu and Lester James V. Miranda and Maarten Sap and Malia Morgan and Michael Schmitz and Michal Guerquin and Michael Wilson and Regan Huff and Ronan Le Bras and Rui Xin and Rulin Shao and Sam Skjonsberg and Shannon Zejiang Shen and Shuyue Stella Li and Tucker Wilde and Valentina Pyatkin and Will Merrill and Yapei Chang and Yuling Gu and Zhiyuan Zeng and Ashish Sabharwal and Luke Zettlemoyer and Pang Wei Koh and Ali Farhadi and Noah A. Smith and Hannaneh Hajishirzi}, year={2025}, eprint={2512.13961}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2512.13961}, }
提供机构:
maas
创建时间:
2025-11-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作