DeepScaleR_Difficulty

Name: DeepScaleR_Difficulty
Creator: maas
Published: 2025-11-27 16:34:46
License: 暂无描述

魔搭社区2025-11-27 更新2025-05-24 收录

下载链接：

https://modelscope.cn/datasets/lime-nlp/DeepScaleR_Difficulty

下载链接

链接失效反馈

官方服务：

资源简介：

# Difficulty Estimation on DeepScaleR We annotate the entire [**DeepScaleR**](https://huggingface.co/agentica-org/DeepScaleR-1.5B-Preview) dataset with a **difficulty score** based on the performance of the [Qwen 2.5-MATH-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B) model. This provides an adaptive signal for curriculum construction and model evaluation. **DeepScaleR** is a curated dataset of 40,000 reasoning-intensive problems used to train and evaluate reinforcement learning-based methods for large language models. ## Difficulty Scoring Method Difficulty scores are estimated using the **Qwen 2.5-MATH-7B** model with the following generation settings: - `temperature = 0.6` - `top_p = 0.9` - `max_tokens = 4096` - Inference performed using [vLLM](https://github.com/vllm-project/vllm) - Each problem is attempted **128 times** The difficulty score `d_i` for each problem is computed as: d_i = 100 × (1 - (# successes / 128)) This approach balances the evaluation signal: - A **strong model** would trivially solve easy problems, compressing the difficulty scale. - A **weak model** would fail uniformly, providing poor resolution. - Qwen 2.5-MATH-7B was selected for its **mid-range capabilities**, offering meaningful gradients across a wide spectrum of problems. ## Difficulty Estimation on Other Datasets We also apply the same difficulty estimation procedure to the following datasets: - [Open Reasoner Zero](https://huggingface.co/datasets/lime-nlp/orz_math_difficulty) - [MATH](https://huggingface.co/datasets/lime-nlp/MATH_difficulty) - [GSM8K](https://huggingface.co/datasets/lime-nlp/GSM8K_difficulty) ## 📬 Contact For questions or feedback, feel free to reach out to [**Taiwei Shi**](https://maksimstw.github.io/) at [taiweish@usc.edu](mailto:taiweish@usc.edu). ## 📚 Citations Github: https://github.com/uscnlp-lime/verl If you find our dataset useful, please cite [Efficient Reinforcement Finetuning via Adaptive Curriculum Learning](https://huggingface.co/papers/2504.05520): ```bibtex @misc{shi2025efficientreinforcementfinetuningadaptive, title={Efficient Reinforcement Finetuning via Adaptive Curriculum Learning}, author={Taiwei Shi and Yiyang Wu and Linxin Song and Tianyi Zhou and Jieyu Zhao}, year={2025}, eprint={2504.05520}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2504.05520}, } ```

# DeepScaleR数据集难度估计本研究基于[Qwen 2.5-MATH-7B模型（Qwen 2.5-MATH-7B）](https://huggingface.co/Qwen/Qwen2.5-Math-7B)的性能表现，为完整的**DeepScaleR数据集（DeepScaleR）**标注了**难度得分**，该标注可为课程构建与模型评估提供自适应信号。 DeepScaleR数据集是一个经过精选的包含40000道推理密集型问题的数据集，用于训练和评估面向大语言模型（Large Language Model, LLM）的强化学习方法。 ## 难度评分方法本研究采用**Qwen 2.5-MATH-7B模型（Qwen 2.5-MATH-7B）**进行难度评分估计，生成参数设置如下： - 温度参数（temperature）= 0.6 - 核采样概率（top_p）= 0.9 - 最大Token数（max_tokens）= 4096 - 推理过程通过[vLLM](https://github.com/vllm-project/vllm)实现 - 每道问题均进行**128次尝试** 每道问题的难度得分`d_i`计算公式如下： `d_i = 100 × (1 - 成功次数 / 128)` 该方法可平衡评估信号： - **强模型**可轻松解决简单问题，压缩难度评分的分布区间 - **弱模型**则会普遍失败，无法提供清晰的分辨率 - 本研究选用Qwen 2.5-MATH-7B模型，正是因其具备**中等性能水平**，可在广泛的问题范围内生成具有区分度的梯度信号 ## 其他数据集的难度估计本研究还将相同的难度估计流程应用于以下数据集： - [Open Reasoner Zero](https://huggingface.co/datasets/lime-nlp/orz_math_difficulty) - [MATH](https://huggingface.co/datasets/lime-nlp/MATH_difficulty) - [GSM8K](https://huggingface.co/datasets/lime-nlp/GSM8K_difficulty) ## 📬 联系方式如有疑问或反馈，可通过邮箱[taiweish@usc.edu](mailto:taiweish@usc.edu)联系[**史泰伟（Taiwei Shi）**](https://maksimstw.github.io/)。 ## 📚 引用 GitHub仓库：https://github.com/uscnlp-lime/verl 若您使用本数据集，请引用论文[《基于自适应课程学习的高效强化微调》（Efficient Reinforcement Finetuning via Adaptive Curriculum Learning）](https://huggingface.co/papers/2504.05520)： bibtex @misc{shi2025efficientreinforcementfinetuningadaptive, title={Efficient Reinforcement Finetuning via Adaptive Curriculum Learning}, author={Taiwei Shi and Yiyang Wu and Linxin Song and Tianyi Zhou and Jieyu Zhao}, year={2025}, eprint={2504.05520}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2504.05520}, }

提供机构：

maas

创建时间：

2025-05-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集