Big-Math-RL-Verified

Name: Big-Math-RL-Verified
Creator: maas
Published: 2026-05-02 18:20:39
License: 暂无描述

魔搭社区2026-05-02 更新2025-03-01 收录

下载链接：

https://modelscope.cn/datasets/SynthLabsAI/Big-Math-RL-Verified

下载链接

链接失效反馈

官方服务：

资源简介：

# Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models Big-Math is the largest open-source dataset of high-quality mathematical problems, curated specifically for reinforcement learning (RL) training in language models. With over 250,000 rigorously filtered and verified problems, Big-Math bridges the gap between quality and quantity, establishing a robust foundation for advancing reasoning in LLMs. <div align="center"> <a href="https://forms.synthlabs.ai/big-math" style="display: inline-block; background-color: #FFB5D8; color: black; font-weight: 600; border-radius: 9999px; padding: 1.5rem 2rem; text-align: center; font-size: 1.125rem; border: 2px solid black; box-shadow: 4px 4px 0px 0px rgba(0,0,0,1); text-decoration: none; margin-bottom: 1rem; transition: opacity 0.3s;"> Request Early Access to Private Reasoning Evals ↗ </a> </div> - 📄 [Click here to read the full details of how Big-Math was created in our paper!](https://arxiv.org/abs/2502.17387) - 💾 [Click here for our github repo](https://github.com/SynthLabsAI/big-math) containing the filters used to create this, and the code for reformulating multiple choice problems into open-ended questions. --- ## 📊 Dataset Details ### Subsets Big-Math is divided into the following subsets: |**Subset** |**Number of Problems**| |---|---| |Orca-Math| 83,215| |cn_k12| 63,609| |olympiads| 33,485| |MATH| 9,257| |aops_forum| 5,740| |GSM8k| 3,254| |HARP| 2,996| |Omni-MATH| 2,478| |amc_aime| 78| |Big-Math-Reformulated| 47,010| |**Total**| 251,122| ### Columns Each problem includes: - **problem**: The math problem in text form. - **answer**: A closed-form, verifiable answer. - **source**: The dataset that the problem was sourced from. - **domain**: The mathematics domain of the problem (eg. sequences and series). - **llama8b_solve_rate**: The percent of Llama-3.1-8B rollouts that succeed (out of 64). --- ## 📋 Dataset Description Big-Math was created to address the limitations of existing math datasets for reinforcement learning, which often force researchers to choose between quality and quantity. Key features of Big-Math include: - **Uniquely verifiable solutions**: Problems with a single correct, verifiable answer. - **Open-ended problem formulations**: Problems requiring reasoning instead of guesswork. - **Closed-form solutions**: Problems with answers expressible in clear, closed-form expressions. Additionally, we provide a new source of 47,000 problems, **Big-Math-Reformulated**, reformulated open-ended questions from multiple-choice formats. --- ## 🔍 Dataset Creation Big-Math is curated using rigorous filtering and cleaning processes to ensure the inclusion of high-quality problems suitable for RL training of LLMs. Below are the key filters and procedures (see the paper for full details): - Deduplication (exact matching and semantic deduplication) - Test set decontamination (using MATH-500 and Omni-MATH test sets) - Remove non-English problems - Remove problems with hyperlinks - Remove problems that are unsolvable in 8 rollouts from Llama-3.1-405B or 64 rollouts from Llama-3.1-8B (excluded from this filter are all problems in HARP, Omni-MATH, MATH, and GSM8k) - Remove multiple choice problems - Remove yes/no and true/false problems - Remove multi-part questions - Remove questions asking for a proof - Clean miscellaneous unnecessary information (eg. problem scoring) --- ## ⚙️ Big-Math-Reformulated Big-Math-Reformulated was created by transforming multiple-choice questions into open-ended formats. This was done through a 4-step process: - Key Information Extraction: Identified core mathematical concepts and rephrasing strategies. - Reformulation: Rewrite questions into open-ended forms using the key information as a guide. - Judgment: Ensure the reformulated problems maintained their mathematical integrity and uniqueness. - Verification: Check that the full process succeeded <p align="left"> <img src="big_math_reformulation.png" width="85%"> </p> ## 🧩 Dataset Difficulty The Llama-3.1-8B solve rate column can be used as a measure for problem difficulty. Below we plot the difficulty by source dataset, and by mathematics domain. <p align="left"> <img src="llama8b_solve_rate_distribution.png" width="85%"> </p> <p align="left"> <img src="llama8b_solve_rate_by_domain.png" width="85%"> </p> --- ## ⚠️ Unverified Dataset Subset We have published an additional subset, containing problems that were unsolvable within the allotted rollouts from Llama-3.1-405B or Llama-3.1-8B models. This subset is considered "unverified," as it may contain incorrect, unparseable, or otherwise problematic question-answer-pairs. Researchers interested in more challenging or uncertain problems may find this subset useful. 🔗 [Access the Unverified Subset (SynthLabsAI/Big-Math-RL-UNVERIFIED)](https://huggingface.co/datasets/SynthLabsAI/Big-Math-RL-UNVERIFIED) --- ## Citation If you use this dataset in your work, please cite us using the below citation: ```bibtex @misc{albalak2025bigmathlargescalehighqualitymath, title={Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models}, author={Alon Albalak and Duy Phung and Nathan Lile and Rafael Rafailov and Kanishk Gandhi and Louis Castricato and Anikait Singh and Chase Blagden and Violet Xiang and Dakota Mahan and Nick Haber}, year={2025}, eprint={2502.17387}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2502.17387}, } ```

# Big-Math：面向语言模型强化学习的大规模高质量数学数据集（Big-Math） Big-Math 是目前规模最大的开源高质量数学问题数据集，专为语言模型的**强化学习（Reinforcement Learning, RL）**训练而构建。该数据集包含超过25万个经过严格筛选与验证的问题，在质量与数量之间搭建了桥梁，为推进**大语言模型（Large Language Model, LLM）**的推理能力发展奠定了坚实基础。 <div align="center"> <a href="https://forms.synthlabs.ai/big-math" style="display: inline-block; background-color: #FFB5D8; color: black; font-weight: 600; border-radius: 9999px; padding: 1.5rem 2rem; text-align: center; font-size: 1.125rem; border: 2px solid black; box-shadow: 4px 4px 0px 0px rgba(0,0,0,1); text-decoration: none; margin-bottom: 1rem; transition: opacity 0.3s;"> 申请私有推理评估的早期访问权限 ↗ </a> </div> - 📄 [点击此处查阅我们论文中关于Big-Math构建的完整细节！](https://arxiv.org/abs/2502.17387) - 💾 [点击此处访问我们的GitHub仓库](https://github.com/SynthLabsAI/big-math)，其中包含用于构建该数据集的筛选方法，以及将多项选择题重构为开放式问题的代码。 --- ## 📊 数据集详情 ### 子集 Big-Math 分为以下子集： |**子集名称** |**问题数量**| |---|---| |Orca-Math| 83,215| |cn_k12| 63,609| |olympiads| 33,485| |MATH| 9,257| |aops_forum| 5,740| |GSM8k| 3,254| |HARP| 2,996| |Omni-MATH| 2,478| |amc_aime| 78| |Big-Math-Reformulated| 47,010| |**总计**| 251,122| ### 数据列每个问题包含以下字段： - **problem**：以文本形式呈现的数学题目。 - **answer**：可验证的闭式解答案。 - **source**：该问题的来源数据集。 - **domain**：该问题所属的数学领域（例如数列与级数）。 - **llama8b_solve_rate**：Llama-3.1-8B 模型的推理成功率百分比（基于64次推理测试）。 --- ## 📋 数据集概述 Big-Math 数据集的构建旨在解决现有数学数据集在强化学习应用中的局限——此类数据集往往迫使研究人员在质量与数量之间做出取舍。Big-Math 的核心特性包括： - **可唯一验证的解答**：具备唯一正确且可验证的标准答案。 - **开放式问题形式**：需要通过逻辑推理完成解答，而非依靠猜测。 - **闭式解答案**：答案可通过清晰的闭式表达式呈现。此外，我们还提供了全新的4.7万个问题子集**Big-Math-Reformulated**，该子集由多项选择题重构为开放式问题而来。 --- ## 🔍 数据集构建流程 Big-Math 通过严格的筛选与清洗流程进行遴选，确保收录的高质量问题适用于大语言模型的强化学习训练。以下为核心筛选规则与处理流程（完整细节请参阅论文）： - 去重（精确匹配去重与语义去重） - 测试集去污染（基于MATH-500与Omni-MATH测试集） - 移除非英文题目 - 移除包含超链接的题目 - 移除在Llama-3.1-405B模型8次推理或Llama-3.1-8B模型64次推理中无法解答的题目（HARP、Omni-MATH、MATH与GSM8k中的所有题目不受此规则限制） - 移除多项选择题 - 移除是非判断题与正误判断题 - 移除多小题题目 - 移除要求提供证明的题目 - 清理多余的无关信息（例如题目分值） --- ## ⚙️ Big-Math-Reformulated 子集 Big-Math-Reformulated 子集通过将多项选择题转换为开放式问题格式构建而成，具体分为以下四个步骤： - 关键信息提取：识别核心数学概念与改写策略。 - 问题重构：以提取的关键信息为指导，将题目改写为开放式形式。 - 有效性校验：确保重构后的题目保留了原有的数学严谨性与答案唯一性。 - 最终验证：确认整个重构流程顺利完成。 <p align="left"> <img src="big_math_reformulation.png" width="85%"> </p> ## 🧩 数据集难度可通过Llama-3.1-8B推理成功率列来衡量题目难度。下文将分别按来源数据集与数学领域绘制难度分布。 <p align="left"> <img src="llama8b_solve_rate_distribution.png" width="85%"> </p> <p align="left"> <img src="llama8b_solve_rate_by_domain.png" width="85%"> </p> --- ## ⚠️ 未验证数据集子集我们额外发布了一个子集，其中包含在Llama-3.1-405B或Llama-3.1-8B模型的指定推理次数内无法解答的题目。该子集被标记为“未验证”，因为其中可能包含错误、无法解析或存在其他问题的题目-答案对。对于寻求更具挑战性或不确定性问题的研究人员而言，该子集具有一定的使用价值。 🔗 [访问未验证子集（SynthLabsAI/Big-Math-RL-UNVERIFIED）](https://huggingface.co/datasets/SynthLabsAI/Big-Math-RL-UNVERIFIED) --- ## 引用若您在研究中使用该数据集，请按以下格式引用我们的工作： bibtex @misc{albalak2025bigmathlargescalehighqualitymath, title={Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models}, author={Alon Albalak and Duy Phung and Nathan Lile and Rafael Rafailov and Kanishk Gandhi and Louis Castricato and Anikait Singh and Chase Blagden and Violet Xiang and Dakota Mahan and Nick Haber}, year={2025}, eprint={2502.17387}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2502.17387}, }

提供机构：

maas

创建时间：

2025-02-26

搜集汇总

数据集介绍