AM-Math-Difficulty-RL

Name: AM-Math-Difficulty-RL
Creator: maas
Published: 2025-12-05 16:32:40
License: 暂无描述

魔搭社区2025-12-05 更新2025-05-03 收录

下载链接：

https://modelscope.cn/datasets/a-m-team/AM-Math-Difficulty-RL

下载链接

链接失效反馈

官方服务：

资源简介：

**For more open-source datasets, models, and methodologies, please visit our [GitHub repository](https://github.com/a-m-team/a-m-models).** We believe that the selection of training data for reinforcement learning is crucial. To validate this, we conducted several experiments exploring how data difficulty influences training performance. Our data sources originate from numerous excellent open-source projects, and we sincerely appreciate their contributions, without which our current achievements would not have been possible. Specifically, this repository provides three mathematics datasets with varying difficulty levels for reinforcement learning (RL) of large language models (LLMs). In short, we determine the difficulty levels of mathematical problems based on the pass rates achieved by various models. The datasets and difficulty levels are detailed in the following paper: > [**How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study**](https://arxiv.org/abs/2504.00829) In our study, we found that carefully selecting reinforcement learning (RL) training data based on appropriate difficulty metrics is crucial. A moderate level of difficulty enhances learning efficiency, striking a balance between sufficiently challenging tasks and avoiding overwhelming the learning process with excessively difficult scenarios. Below are the model's performance results on the test set when trained via RL on mathematical datasets of three distinct difficulty levels: <img src="data_difficulty.png" alt="alt text" width="60%"/> Conducting reinforcement learning (RL) training in stages based on datasets with progressively increasing difficulty can further enhance the performance of large language models (LLMs) on reasoning tasks. (Since no code-related training data were included, the model's performance on LiveCodeBench remains essentially the same as the base model.) <img src="staged.png" alt="alt text" width="60%"/> <img src="staged2.png" alt="alt text" width="60%"/> ## Data Difficulty Criteria ### Difficulty Level 1 Tasks where the DeepSeek-R1-Distill-Qwen-1.5B model partially succeeds (pass rate ∈ (0, 1)). ### Difficulty Level 2 Includes three types of tasks: - Tasks that the 1.5B model fails (pass rate = 0) but the 7B model fully succeeds (pass rate = 1). - Tasks that the 1.5B model fails (pass rate = 0) but the 7B model partially succeeds (pass rate ∈ (0, 1)). - Tasks where both 1.5B and 7B models partially succeed. ### Difficulty Level 3 Tasks selected based on performance of the DeepSeek-R1-Distill-Qwen-32B model: - Tasks that the 32B model consistently fails. - 50% of successful tasks are retained. ## Citation If you use these datasets, please cite the paper: ```bibtex @misc{ji2025difficultyawarestagedreinforcementlearning, title={How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study}, author={Yunjie Ji and Sitong Zhao and Xiaoyu Tian and Haotian Wang and Shuaiting Chen and Yiping Peng and Han Zhao and Xiangang Li}, year={2025}, eprint={2504.00829}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2504.00829}, } ```

如需获取更多开源数据集、模型与方法论，请访问我们的[GitHub仓库](https://github.com/a-m-team/a-m-models)。我们认为，强化学习（RL）训练数据的选取至关重要。为验证这一观点，我们开展了多项实验，探究数据难度对训练性能的影响。本研究的数据来源均来自众多优秀开源项目，我们由衷感谢这些项目的贡献——若无这些支持，本研究当前的成果将无法达成。具体而言，本仓库提供了三个不同难度层级的数学数据集，用于大语言模型（LLMs）的强化学习（RL）训练。简言之，我们基于不同模型在题目上的通过率来划定数学题的难度等级。本数据集与难度分级的详细信息可参阅下述论文： > [**感知难度的分阶段强化学习如何提升大语言模型的推理能力：一项初步实验研究**](https://arxiv.org/abs/2504.00829) 在本研究中，我们发现基于合适的难度指标精心筛选强化学习训练数据至关重要。适度的难度水平可提升学习效率，在具备足够挑战性的任务与避免因难度过高而干扰学习过程之间达成平衡。以下为模型在三种不同难度层级的数学数据集上通过强化学习训练后，在测试集上的性能表现： <img src="data_difficulty.png" alt="alt text" width="60%"/> 基于难度逐级提升的数据集开展分阶段强化学习（RL）训练，可进一步提升大语言模型（LLMs）在推理任务上的性能表现。（由于未包含任何与代码相关的训练数据，模型在LiveCodeBench上的性能与基础模型基本一致。） <img src="staged.png" alt="alt text" width="60%"/> <img src="staged2.png" alt="alt text" width="60%"/> ## 数据难度分级标准 ### 难度等级1 DeepSeek-R1-Distill-Qwen-1.5B模型可部分完成的任务（通过率范围为(0, 1)）。 ### 难度等级2 包含三类任务： - 1.5B模型无法完成（通过率=0）但7B模型可完全完成（通过率=1）的任务； - 1.5B模型无法完成（通过率=0）但7B模型可部分完成（通过率范围为(0, 1)）的任务； - 1.5B与7B模型均可部分完成的任务。 ### 难度等级3 基于DeepSeek-R1-Distill-Qwen-32B模型的表现筛选的任务： - 32B模型始终无法完成的任务； - 保留50%可完成的任务。 ## 引用声明若您使用本数据集，请引用下述论文： bibtex @misc{ji2025difficultyawarestagedreinforcementlearning, title={How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study}, author={Yunjie Ji and Sitong Zhao and Xiaoyu Tian and Haotian Wang and Shuaiting Chen and Yiping Peng and Han Zhao and Xiangang Li}, year={2025}, eprint={2504.00829}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2504.00829}, }

提供机构：

maas

创建时间：

2025-04-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集