Dolci-Think-RL-7B

Name: Dolci-Think-RL-7B
Creator: maas
Published: 2025-12-05 16:57:08
License: 暂无描述

魔搭社区2025-12-05 更新2025-11-22 收录

下载链接：

https://modelscope.cn/datasets/allenai/Dolci-Think-RL-7B

下载链接

链接失效反馈

官方服务：

资源简介：

# Dolci-Think-RL-7B ## Dataset Summary **Dolci-Think-RL-7B** is the reinforcement learning dataset used to train the *Olmo-3-7B-Think* model. It contains **102,014** prompts designed to elicit deep reasoning across: - Math - Coding - Precise Instruction Following - General Chat It blends high-quality curated sources with filtering designed for deliberate reasoning. --- ## Dataset Composition ### **Total Samples:** 102,014 ### **Original Dataset Contribution** | Source Dataset | Count | |----------------|-------| | IF Multi-Constraint | 29,813 | | OMEGA Math ([paper](https://arxiv.org/abs/2506.18880)) | 15,000 | | AceCoder ([paper](https://arxiv.org/abs/2502.01718)) | 10,107 | | Tulu 3 Rewritten ([paper](https://arxiv.org/abs/2411.15124)) | 7,109 | | Multi-Subject RLVR ([paper](https://arxiv.org/abs/2503.23829v1)) | 7,106 | | AceReason-Math ([paper](https://arxiv.org/abs/2505.16400)) | 6,598 | | WildChat English ([paper](https://arxiv.org/abs/2405.01470)) | 6,421 | | KlearReasoner Code | 6,272 | | SYNTHETIC-2 / PrimeIntellect ([blog](https://www.primeintellect.ai/blog/synthetic-2)) | 3,000 | | MathSub-30K (KlearReasoner Math) ([paper](https://arxiv.org/abs/2508.07629)) | 2,999 | | ORZ Math ([paper](https://arxiv.org/abs/2503.24290)) | 2,999 | | DAPO-Math ([paper](https://arxiv.org/abs/2503.14476)) | 2,584 | | Llama-Nemotron Post-Training Dataset ([paper](https://arxiv.org/abs/2505.00949)) | 2,006 | ### **Dataset Source Counts (Grouped Mixes)** | Mix | Count | |------|-------| | Math RLVR Mixture | 30,180 | | IF RLVR Mixture | 29,813 | | Code RLVR Mixture | 21,385 | | General RLVR Mixture | 20,636 | --- ## Data Sources & Description ### **Instruction Following** - Up to 5 constraints - Derived from IFBench-Train & IFEval-style tasks - Filtered for clarity and non-toxicity ### **Math Reasoning** - **OMEGA** - **AceReason-Math** - **ORZ Math** - **DAPO-Math** - **MathSub-30K** - Wide domain coverage: geometry, algebra, combinatorics, proofs, etc. ### **Code Reasoning** Includes four major families: - **AceCoder** - **KlearReasoner-Code** - **SYNTHETIC-2 / PrimeIntellect** - **Llama-Nemotron Post-Training Dataset** All filtered via test-case execution. ### **General Long-Form Reasoning** - Multi-Subject RLVR - Tulu 3 rewritten (filtered via F1-score) - WildChat English (filtered for reasoning suitability) --- ## Processing & Filtering - **Execution-based code filtering** (test-case validated) - **Topic filtering** for safety and quality - **F1-based rewrite filtering** (Tulu 3) - **Difficulty-tiered Nemotron subsets** - **Strict deduplication** - **Constraint normalization** --- ## License This dataset is licensed under ODC-BY. It is intended for research and educational use in accordance with [Ai2's Responsible Use Guidelines](https://allenai.org/responsible-use). ## Citation A technical manuscript is forthcoming!

# Dolci-Think-RL-7B ## 数据集概览 **Dolci-Think-RL-7B** 是用于训练*Olmo-3-7B-Think*模型的强化学习（Reinforcement Learning, RL）数据集。该数据集共包含102,014条提示词，旨在触发覆盖数学、编程、精准指令遵循以及通用对话四大领域的深度推理。其融合了高质量精选数据源，并针对审慎推理场景进行了针对性筛选。 ## 数据集构成 ### 总样本量：102,014 ### 原始数据集贡献 | 源数据集 | 样本量 | |----------------|-------| | IF Multi-Constraint | 29,813 | | OMEGA Math（[论文](https://arxiv.org/abs/2506.18880)） | 15,000 | | AceCoder（[论文](https://arxiv.org/abs/2502.01718)） | 10,107 | | Tulu 3 Rewritten（[论文](https://arxiv.org/abs/2411.15124)） | 7,109 | | Multi-Subject RLVR（[论文](https://arxiv.org/abs/2503.23829v1)） | 7,106 | | AceReason-Math（[论文](https://arxiv.org/abs/2505.16400)） | 6,598 | | WildChat English（[论文](https://arxiv.org/abs/2405.01470)） | 6,421 | | KlearReasoner代码 | 6,272 | | SYNTHETIC-2 / PrimeIntellect（[博客](https://www.primeintellect.ai/blog/synthetic-2)） | 3,000 | | MathSub-30K (KlearReasoner数学)（[论文](https://arxiv.org/abs/2508.07629)） | 2,999 | | ORZ Math（[论文](https://arxiv.org/abs/2503.24290)） | 2,999 | | DAPO-Math（[论文](https://arxiv.org/abs/2503.14476)） | 2,584 | | Llama-Nemotron后训练数据集（[论文](https://arxiv.org/abs/2505.00949)） | 2,006 | ### 分组混合数据集来源统计 | 混合类别 | 样本量 | |------|-------| | Math RLVR Mixture | 30,180 | | IF RLVR Mixture | 29,813 | | Code RLVR Mixture | 21,385 | | General RLVR Mixture | 20,636 | ## 数据源与说明 ### 指令遵循任务 - 支持最多5项约束条件 - 衍生自IFBench-Train及IFEval风格任务 - 针对文本清晰度与无毒性进行了筛选优化 ### 数学推理涵盖的数据源包括OMEGA、AceReason-Math、ORZ Math、DAPO-Math、MathSub-30K，覆盖几何、代数、组合数学、定理证明等广泛学术领域。 ### 编程推理包含四大核心类别：AceCoder、KlearReasoner代码、SYNTHETIC-2 / PrimeIntellect以及Llama-Nemotron后训练数据集，所有数据均通过测试用例执行完成验证筛选。 ### 通用长文本推理数据源包括Multi-Subject RLVR、经F1分数筛选的Tulu 3重写数据集，以及针对推理适配性进行筛选的WildChat English语料。 ## 处理与筛选流程 - 基于执行验证的代码筛选（通过测试用例校验） - 面向安全性与质量的主题过滤 - 基于F1分数的Tulu 3重写数据筛选 - 按难度分级的Nemotron子集划分 - 严格去重处理 - 约束条件标准化对齐 ## 许可证本数据集采用ODC-BY许可证发布，旨在遵循艾伦人工智能研究所（Ai2）的《负责任使用指南》（https://allenai.org/responsible-use），仅用于研究与教育用途。 ## 引用信息相关技术稿件即将正式发布！

提供机构：

maas

创建时间：

2025-11-21

5,000+

优质数据集

54 个

任务类型

进入经典数据集