Dolci-Think-RL-32B
收藏魔搭社区2026-01-07 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/allenai/Dolci-Think-RL-32B
下载链接
链接失效反馈官方服务:
资源简介:
# Dolci-Think-RL
## Dataset Summary
**Dolci-Think-RL** is a deliberate reasoning RL dataset used for training *Olmo-3-32B-Think* model.
It contains **102,026** high-quality prompts covering:
- Math
- Code
- Precise Instruction Following
- General Chat
This dataset is structurally similar to Dolci-Think-RL-7B but with slightly different mixtures.
---
## Dataset Composition
### **Total Samples:** 102,026
### **Original Dataset Contribution**
| Source Dataset | Count |
|----------------|-------|
| IF Multi-Constraint | 29,847 |
| OMEGA Math ([paper](https://arxiv.org/abs/2506.18880)) | 15,000 |
| AceCoder ([paper](https://arxiv.org/abs/2502.01718)) | 10,107 |
| Multi-Subject RLVR ([paper](https://arxiv.org/abs/2503.23829v1)) | 8,129 |
| Tulu 3 Rewritten ([paper](https://arxiv.org/abs/2411.15124)) | 8,040 |
| AceReason-Math ([paper](https://arxiv.org/abs/2505.16400)) | 6,599 |
| KlearReasoner Code | 6,176 |
| WildChat English ([paper](https://arxiv.org/abs/2405.01470)) | 4,539 |
| ORZ Math ([paper](https://arxiv.org/abs/2503.24290)) | 3,000 |
| SYNTHETIC-2 / PrimeIntellect ([blog](https://www.primeintellect.ai/blog/synthetic-2)) | 3,000 |
| MathSub-30K (KlearReasoner Math) ([paper](https://arxiv.org/abs/2508.07629)) | 2,999 |
| DAPO-Math ([paper](https://arxiv.org/abs/2503.14476)) | 2,584 |
| Llama-Nemotron Post-Training Dataset ([paper](https://arxiv.org/abs/2505.00949)) | 2,006 |
### **Dataset Source Counts (Grouped Mixes)**
| Mix | Count |
|------|-------|
| Math RLVR Mixture | 30,182 |
| IF RLVR Mixture | 29,847 |
| Code RLVR Mixture | 21,289 |
| General RLVR Mixture | 20,708 |
---
## Data Sources & Description
### **Instruction Following**
- IFBench/IFEval-derived multi-constraint tasks
- Normalized and filtered
### **Math Reasoning**
Includes data from:
- OMEGA
- AceReason-Math
- ORZ
- DAPO-Math
- MathSub-30K
Covers algebra, combinatorics, geometry, number theory, proofs, and competition-style problems.
### **Code Reasoning**
Includes:
- AceCoder
- KlearReasoner-Code
- SYNTHETIC-2 (PrimeIntellect)
- Llama-Nemotron Post-Training Dataset
All validated using execution-based filtering.
### **General Long-Form Reasoning**
- Multi-Subject RLVR
- Tulu 3 rewritten (filtered via F1 score)
- WildChat English (topic + character filtering)
---
## Processing & Filtering
- **Keyword & topic filtering**
- **Execution-based test-case validation**
- **F1-score filtering** of rewritten prompts
- **Nemotron difficulty-tier selection**
- **Safety filtering + deduplication**
- **Constraint normalization** for IF tasks
---
## License
This dataset is licensed under ODC-BY. It is intended for research and educational use in accordance with [Ai2's Responsible Use Guidelines](https://allenai.org/responsible-use).
## Citation
```
@misc{olmo2025olmo3,
title={Olmo 3},
author={Team Olmo and Allyson Ettinger and Amanda Bertsch and Bailey Kuehl and David Graham and David Heineman and Dirk Groeneveld and Faeze Brahman and Finbarr Timbers and Hamish Ivison and Jacob Morrison and Jake Poznanski and Kyle Lo and Luca Soldaini and Matt Jordan and Mayee Chen and Michael Noukhovitch and Nathan Lambert and Pete Walsh and Pradeep Dasigi and Robert Berry and Saumya Malik and Saurabh Shah and Scott Geng and Shane Arora and Shashank Gupta and Taira Anderson and Teng Xiao and Tyler Murray and Tyler Romero and Victoria Graf and Akari Asai and Akshita Bhagia and Alexander Wettig and Alisa Liu and Aman Rangapur and Chloe Anastasiades and Costa Huang and Dustin Schwenk and Harsh Trivedi and Ian Magnusson and Jaron Lochner and Jiacheng Liu and Lester James V. Miranda and Maarten Sap and Malia Morgan and Michael Schmitz and Michal Guerquin and Michael Wilson and Regan Huff and Ronan Le Bras and Rui Xin and Rulin Shao and Sam Skjonsberg and Shannon Zejiang Shen and Shuyue Stella Li and Tucker Wilde and Valentina Pyatkin and Will Merrill and Yapei Chang and Yuling Gu and Zhiyuan Zeng and Ashish Sabharwal and Luke Zettlemoyer and Pang Wei Koh and Ali Farhadi and Noah A. Smith and Hannaneh Hajishirzi},
year={2025},
eprint={2512.13961},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.13961},
}
```
# Dolci-Think-RL
## 数据集概述
**Dolci-Think-RL** 是一款用于训练*Olmo-3-32B-Think*模型的深思型强化学习(Reinforcement Learning,RL)数据集。它包含**102,026**条高质量提示词,覆盖以下领域:
- 数学
- 代码
- 精准指令遵循
- 通用对话
本数据集在结构上与Dolci-Think-RL-7B相似,但混合比例略有差异。
---
## 数据集构成
### **总样本数:** 102,026
### **原始数据集贡献**
| 源数据集 | 样本数量 |
|----------------|-------|
| 多约束指令遵循(IF Multi-Constraint) | 29,847 |
| OMEGA数学数据集([论文](https://arxiv.org/abs/2506.18880)) | 15,000 |
| AceCoder([论文](https://arxiv.org/abs/2502.01718)) | 10,107 |
| 多学科RLVR数据集([论文](https://arxiv.org/abs/2503.23829v1)) | 8,129 |
| 重写版Tulu 3([论文](https://arxiv.org/abs/2411.15124)) | 8,040 |
| AceReason-Math([论文](https://arxiv.org/abs/2505.16400)) | 6,599 |
| KlearReasoner代码数据集 | 6,176 |
| 英文WildChat([论文](https://arxiv.org/abs/2405.01470)) | 4,539 |
| ORZ数学数据集([论文](https://arxiv.org/abs/2503.24290)) | 3,000 |
| SYNTHETIC-2 / PrimeIntellect([博客](https://www.primeintellect.ai/blog/synthetic-2)) | 3,000 |
| MathSub-30K(KlearReasoner数学数据集)([论文](https://arxiv.org/abs/2508.07629)) | 2,999 |
| DAPO-Math([论文](https://arxiv.org/abs/2503.14476)) | 2,584 |
| Llama-Nemotron后训练数据集([论文](https://arxiv.org/abs/2505.00949)) | 2,006 |
### **按混合分组的数据集来源样本数**
| 混合组 | 样本数量 |
|------|-------|
| 数学RLVR混合组 | 30,182 |
| 指令遵循RLVR混合组 | 29,847 |
| 代码RLVR混合组 | 21,289 |
| 通用RLVR混合组 | 20,708 |
---
## 数据来源与说明
### **指令遵循任务**
- 源自IFBench/IFEval的多约束任务
- 已完成归一化与过滤处理
### **数学推理任务**
包含以下来源的数据:
- OMEGA
- AceReason-Math
- ORZ
- DAPO-Math
- MathSub-30K
覆盖代数、组合数学、几何学、数论、定理证明及竞赛类题型。
### **代码推理任务**
包含以下来源的数据:
- AceCoder
- KlearReasoner-Code
- SYNTHETIC-2(PrimeIntellect)
- Llama-Nemotron后训练数据集
所有数据均通过执行验证过滤。
### **通用长文本推理任务**
- 多学科RLVR数据集
- 经F1分数过滤的重写版Tulu 3
- 经主题与角色过滤的英文WildChat
---
## 数据处理与过滤流程
- **关键词与主题过滤**
- **基于执行的测试用例验证**
- **重写提示词的F1分数过滤**
- **Nemotron难度层级筛选**
- **安全过滤与去重**
- **指令遵循任务的约束归一化**
---
## 许可证
本数据集采用ODC-BY许可证发布,旨在用于研究与教育用途,需遵循[AllenAI负责任使用指南](https://allenai.org/responsible-use)。
## 引用
@misc{olmo2025olmo3,
title={Olmo 3},
author={Team Olmo and Allyson Ettinger and Amanda Bertsch and Bailey Kuehl and David Graham and David Heineman and Dirk Groeneveld and Faeze Brahman and Finbarr Timbers and Hamish Ivison and Jacob Morrison and Jake Poznanski and Kyle Lo and Luca Soldaini and Matt Jordan and Mayee Chen and Michael Noukhovitch and Nathan Lambert and Pete Walsh and Pradeep Dasigi and Robert Berry and Saumya Malik and Saurabh Shah and Scott Geng and Shane Arora and Shashank Gupta and Taira Anderson and Teng Xiao and Tyler Murray and Tyler Romero and Victoria Graf and Akari Asai and Akshita Bhagia and Alexander Wettig and Alisa Liu and Aman Rangapur and Chloe Anastasiades and Costa Huang and Dustin Schwenk and Harsh Trivedi and Ian Magnusson and Jaron Lochner and Jiacheng Liu and Lester James V. Miranda and Maarten Sap and Malia Morgan and Michael Schmitz and Michal Guerquin and Michael Wilson and Regan Huff and Ronan Le Bras and Rui Xin and Rulin Shao and Sam Skjonsberg and Shannon Zejiang Shen and Shuyue Stella Li and Tucker Wilde and Valentina Pyatkin and Will Merrill and Yapei Chang and Yuling Gu and Zhiyuan Zeng and Ashish Sabharwal and Luke Zettlemoyer and Pang Wei Koh and Ali Farhadi and Noah A. Smith and Hannaneh Hajishirzi},
year={2025},
eprint={2512.13961},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.13961},
}
提供机构:
maas
创建时间:
2025-11-30



