Dolci-Think-RL

Name: Dolci-Think-RL
Creator: maas
Published: 2025-12-05 16:57:09
License: 暂无描述

魔搭社区2025-12-05 更新2025-11-22 收录

下载链接：

https://modelscope.cn/datasets/allenai/Dolci-Think-RL

下载链接

链接失效反馈

官方服务：

资源简介：

# Dolci-Think-RL ## Dataset Summary **Dolci-Think-RL** is a deliberate reasoning RL dataset used for training *Olmo-3-32B-Think* model. It contains **102,026** high-quality prompts covering: - Math - Code - Precise Instruction Following - General Chat This dataset is structurally similar to Dolci-Think-RL-7B but with slightly different mixtures. --- ## Dataset Composition ### **Total Samples:** 102,026 ### **Original Dataset Contribution** | Source Dataset | Count | |----------------|-------| | IF Multi-Constraint | 29,847 | | OMEGA Math ([paper](https://arxiv.org/abs/2506.18880)) | 15,000 | | AceCoder ([paper](https://arxiv.org/abs/2502.01718)) | 10,107 | | Multi-Subject RLVR ([paper](https://arxiv.org/abs/2503.23829v1)) | 8,129 | | Tulu 3 Rewritten ([paper](https://arxiv.org/abs/2411.15124)) | 8,040 | | AceReason-Math ([paper](https://arxiv.org/abs/2505.16400)) | 6,599 | | KlearReasoner Code | 6,176 | | WildChat English ([paper](https://arxiv.org/abs/2405.01470)) | 4,539 | | ORZ Math ([paper](https://arxiv.org/abs/2503.24290)) | 3,000 | | SYNTHETIC-2 / PrimeIntellect ([blog](https://www.primeintellect.ai/blog/synthetic-2)) | 3,000 | | MathSub-30K (KlearReasoner Math) ([paper](https://arxiv.org/abs/2508.07629)) | 2,999 | | DAPO-Math ([paper](https://arxiv.org/abs/2503.14476)) | 2,584 | | Llama-Nemotron Post-Training Dataset ([paper](https://arxiv.org/abs/2505.00949)) | 2,006 | ### **Dataset Source Counts (Grouped Mixes)** | Mix | Count | |------|-------| | Math RLVR Mixture | 30,182 | | IF RLVR Mixture | 29,847 | | Code RLVR Mixture | 21,289 | | General RLVR Mixture | 20,708 | --- ## Data Sources & Description ### **Instruction Following** - IFBench/IFEval-derived multi-constraint tasks - Normalized and filtered ### **Math Reasoning** Includes data from: - OMEGA - AceReason-Math - ORZ - DAPO-Math - MathSub-30K Covers algebra, combinatorics, geometry, number theory, proofs, and competition-style problems. ### **Code Reasoning** Includes: - AceCoder - KlearReasoner-Code - SYNTHETIC-2 (PrimeIntellect) - Llama-Nemotron Post-Training Dataset All validated using execution-based filtering. ### **General Long-Form Reasoning** - Multi-Subject RLVR - Tulu 3 rewritten (filtered via F1 score) - WildChat English (topic + character filtering) --- ## Processing & Filtering - **Keyword & topic filtering** - **Execution-based test-case validation** - **F1-score filtering** of rewritten prompts - **Nemotron difficulty-tier selection** - **Safety filtering + deduplication** - **Constraint normalization** for IF tasks --- ## License This dataset is licensed under ODC-BY. It is intended for research and educational use in accordance with [Ai2's Responsible Use Guidelines](https://allenai.org/responsible-use). ## Citation A technical manuscript is forthcoming!

# Dolci-Think-RL ## 数据集概述 **Dolci-Think-RL** 是一款用于训练*Olmo-3-32B-Think*模型的刻意推理强化学习（Reinforcement Learning, RL）数据集。它包含**102,026条**高质量提示词，涵盖以下领域： - 数学 - 代码 - 精准指令遵循 - 通用对话该数据集在结构上与Dolci-Think-RL-7B相似，但混合比例略有差异。 --- ## 数据集构成 ### **总样本数：102,026** ### **原始数据集贡献** | 源数据集 | 样本量 | |----------------|-------| | 多约束指令遵循（IF Multi-Constraint） | 29,847 | | OMEGA 数学数据集（[论文](https://arxiv.org/abs/2506.18880)） | 15,000 | | AceCoder（[论文](https://arxiv.org/abs/2502.01718)） | 10,107 | | 多学科RLVR（Multi-Subject RLVR，[论文](https://arxiv.org/abs/2503.23829v1)） | 8,129 | | 重写版Tulu 3（[论文](https://arxiv.org/abs/2411.15124)） | 8,040 | | AceReason-Math（[论文](https://arxiv.org/abs/2505.16400)） | 6,599 | | KlearReasoner 代码数据集 | 6,176 | | 英文WildChat（[论文](https://arxiv.org/abs/2405.01470)） | 4,539 | | ORZ 数学数据集（[论文](https://arxiv.org/abs/2503.24290)） | 3,000 | | SYNTHETIC-2 / PrimeIntellect（[博客](https://www.primeintellect.ai/blog/synthetic-2)） | 3,000 | | MathSub-30K（KlearReasoner 数学数据集，[论文](https://arxiv.org/abs/2508.07629)） | 2,999 | | DAPO-Math（[论文](https://arxiv.org/abs/2503.14476)） | 2,584 | | Llama-Nemotron 后训练数据集（[论文](https://arxiv.org/abs/2505.00949)） | 2,006 | ### **分组混合数据集来源统计** | 混合组 | 样本量 | |------|-------| | 数学RLVR混合组 | 30,182 | | 指令遵循RLVR混合组 | 29,847 | | 代码RLVR混合组 | 21,289 | | 通用RLVR混合组 | 20,708 | --- ## 数据来源与说明 ### **指令遵循任务** - 源自IFBench/IFEval的多约束任务 - 已完成归一化与过滤处理 ### **数学推理任务** 包含以下来源的数据： - OMEGA - AceReason-Math - ORZ - DAPO-Math - MathSub-30K 涵盖代数、组合数学、几何学、数论、形式化证明以及竞赛类题型。 ### **代码推理任务** 包含以下来源的数据： - AceCoder - KlearReasoner-Code - SYNTHETIC-2（PrimeIntellect） - Llama-Nemotron 后训练数据集所有数据均通过执行验证的方式完成过滤。 ### **通用长文本推理任务** - 多学科RLVR - 重写版Tulu 3（通过F1分数完成过滤） - 英文WildChat（通过主题与角色特征完成过滤） --- ## 处理与过滤流程 - **关键词与主题过滤** - **基于执行的测试用例验证** - **重写提示词的F1分数过滤** - **Nemotron难度层级筛选** - **安全过滤与去重** - **指令遵循任务的约束归一化** --- ## 许可协议本数据集采用ODC-BY许可协议发布，旨在遵循[AllenAI负责任使用指南](https://allenai.org/responsible-use)用于研究与教育用途。 ## 引用信息相关技术手稿即将发布！

提供机构：

maas

创建时间：

2025-11-21

5,000+

优质数据集

54 个

任务类型

进入经典数据集