Dolci-Think-RL-7B
收藏魔搭社区2025-12-05 更新2025-11-22 收录
下载链接:
https://modelscope.cn/datasets/allenai/Dolci-Think-RL-7B
下载链接
链接失效反馈官方服务:
资源简介:
# Dolci-Think-RL-7B
## Dataset Summary
**Dolci-Think-RL-7B** is the reinforcement learning dataset used to train the *Olmo-3-7B-Think* model.
It contains **102,014** prompts designed to elicit deep reasoning across:
- Math
- Coding
- Precise Instruction Following
- General Chat
It blends high-quality curated sources with filtering designed for deliberate reasoning.
---
## Dataset Composition
### **Total Samples:** 102,014
### **Original Dataset Contribution**
| Source Dataset | Count |
|----------------|-------|
| IF Multi-Constraint | 29,813 |
| OMEGA Math ([paper](https://arxiv.org/abs/2506.18880)) | 15,000 |
| AceCoder ([paper](https://arxiv.org/abs/2502.01718)) | 10,107 |
| Tulu 3 Rewritten ([paper](https://arxiv.org/abs/2411.15124)) | 7,109 |
| Multi-Subject RLVR ([paper](https://arxiv.org/abs/2503.23829v1)) | 7,106 |
| AceReason-Math ([paper](https://arxiv.org/abs/2505.16400)) | 6,598 |
| WildChat English ([paper](https://arxiv.org/abs/2405.01470)) | 6,421 |
| KlearReasoner Code | 6,272 |
| SYNTHETIC-2 / PrimeIntellect ([blog](https://www.primeintellect.ai/blog/synthetic-2)) | 3,000 |
| MathSub-30K (KlearReasoner Math) ([paper](https://arxiv.org/abs/2508.07629)) | 2,999 |
| ORZ Math ([paper](https://arxiv.org/abs/2503.24290)) | 2,999 |
| DAPO-Math ([paper](https://arxiv.org/abs/2503.14476)) | 2,584 |
| Llama-Nemotron Post-Training Dataset ([paper](https://arxiv.org/abs/2505.00949)) | 2,006 |
### **Dataset Source Counts (Grouped Mixes)**
| Mix | Count |
|------|-------|
| Math RLVR Mixture | 30,180 |
| IF RLVR Mixture | 29,813 |
| Code RLVR Mixture | 21,385 |
| General RLVR Mixture | 20,636 |
---
## Data Sources & Description
### **Instruction Following**
- Up to 5 constraints
- Derived from IFBench-Train & IFEval-style tasks
- Filtered for clarity and non-toxicity
### **Math Reasoning**
- **OMEGA**
- **AceReason-Math**
- **ORZ Math**
- **DAPO-Math**
- **MathSub-30K**
- Wide domain coverage: geometry, algebra, combinatorics, proofs, etc.
### **Code Reasoning**
Includes four major families:
- **AceCoder**
- **KlearReasoner-Code**
- **SYNTHETIC-2 / PrimeIntellect**
- **Llama-Nemotron Post-Training Dataset**
All filtered via test-case execution.
### **General Long-Form Reasoning**
- Multi-Subject RLVR
- Tulu 3 rewritten (filtered via F1-score)
- WildChat English (filtered for reasoning suitability)
---
## Processing & Filtering
- **Execution-based code filtering** (test-case validated)
- **Topic filtering** for safety and quality
- **F1-based rewrite filtering** (Tulu 3)
- **Difficulty-tiered Nemotron subsets**
- **Strict deduplication**
- **Constraint normalization**
---
## License
This dataset is licensed under ODC-BY. It is intended for research and educational use in accordance with [Ai2's Responsible Use Guidelines](https://allenai.org/responsible-use).
## Citation
A technical manuscript is forthcoming!
# Dolci-Think-RL-7B
## 数据集概览
**Dolci-Think-RL-7B** 是用于训练*Olmo-3-7B-Think*模型的强化学习(Reinforcement Learning, RL)数据集。该数据集共包含102,014条提示词,旨在触发覆盖数学、编程、精准指令遵循以及通用对话四大领域的深度推理。其融合了高质量精选数据源,并针对审慎推理场景进行了针对性筛选。
## 数据集构成
### 总样本量:102,014
### 原始数据集贡献
| 源数据集 | 样本量 |
|----------------|-------|
| IF Multi-Constraint | 29,813 |
| OMEGA Math([论文](https://arxiv.org/abs/2506.18880)) | 15,000 |
| AceCoder([论文](https://arxiv.org/abs/2502.01718)) | 10,107 |
| Tulu 3 Rewritten([论文](https://arxiv.org/abs/2411.15124)) | 7,109 |
| Multi-Subject RLVR([论文](https://arxiv.org/abs/2503.23829v1)) | 7,106 |
| AceReason-Math([论文](https://arxiv.org/abs/2505.16400)) | 6,598 |
| WildChat English([论文](https://arxiv.org/abs/2405.01470)) | 6,421 |
| KlearReasoner代码 | 6,272 |
| SYNTHETIC-2 / PrimeIntellect([博客](https://www.primeintellect.ai/blog/synthetic-2)) | 3,000 |
| MathSub-30K (KlearReasoner数学)([论文](https://arxiv.org/abs/2508.07629)) | 2,999 |
| ORZ Math([论文](https://arxiv.org/abs/2503.24290)) | 2,999 |
| DAPO-Math([论文](https://arxiv.org/abs/2503.14476)) | 2,584 |
| Llama-Nemotron后训练数据集([论文](https://arxiv.org/abs/2505.00949)) | 2,006 |
### 分组混合数据集来源统计
| 混合类别 | 样本量 |
|------|-------|
| Math RLVR Mixture | 30,180 |
| IF RLVR Mixture | 29,813 |
| Code RLVR Mixture | 21,385 |
| General RLVR Mixture | 20,636 |
## 数据源与说明
### 指令遵循任务
- 支持最多5项约束条件
- 衍生自IFBench-Train及IFEval风格任务
- 针对文本清晰度与无毒性进行了筛选优化
### 数学推理
涵盖的数据源包括OMEGA、AceReason-Math、ORZ Math、DAPO-Math、MathSub-30K,覆盖几何、代数、组合数学、定理证明等广泛学术领域。
### 编程推理
包含四大核心类别:AceCoder、KlearReasoner代码、SYNTHETIC-2 / PrimeIntellect以及Llama-Nemotron后训练数据集,所有数据均通过测试用例执行完成验证筛选。
### 通用长文本推理
数据源包括Multi-Subject RLVR、经F1分数筛选的Tulu 3重写数据集,以及针对推理适配性进行筛选的WildChat English语料。
## 处理与筛选流程
- 基于执行验证的代码筛选(通过测试用例校验)
- 面向安全性与质量的主题过滤
- 基于F1分数的Tulu 3重写数据筛选
- 按难度分级的Nemotron子集划分
- 严格去重处理
- 约束条件标准化对齐
## 许可证
本数据集采用ODC-BY许可证发布,旨在遵循艾伦人工智能研究所(Ai2)的《负责任使用指南》(https://allenai.org/responsible-use),仅用于研究与教育用途。
## 引用信息
相关技术稿件即将正式发布!
提供机构:
maas
创建时间:
2025-11-21



