DeepScaleR_Difficulty
收藏魔搭社区2025-11-27 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/lime-nlp/DeepScaleR_Difficulty
下载链接
链接失效反馈官方服务:
资源简介:
# Difficulty Estimation on DeepScaleR
We annotate the entire [**DeepScaleR**](https://huggingface.co/agentica-org/DeepScaleR-1.5B-Preview) dataset with a **difficulty score** based on the performance of the [Qwen 2.5-MATH-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B) model. This provides an adaptive signal for curriculum construction and model evaluation.
**DeepScaleR** is a curated dataset of 40,000 reasoning-intensive problems used to train and evaluate reinforcement learning-based methods for large language models.
## Difficulty Scoring Method
Difficulty scores are estimated using the **Qwen 2.5-MATH-7B** model with the following generation settings:
- `temperature = 0.6`
- `top_p = 0.9`
- `max_tokens = 4096`
- Inference performed using [vLLM](https://github.com/vllm-project/vllm)
- Each problem is attempted **128 times**
The difficulty score `d_i` for each problem is computed as:
d_i = 100 × (1 - (# successes / 128))
This approach balances the evaluation signal:
- A **strong model** would trivially solve easy problems, compressing the difficulty scale.
- A **weak model** would fail uniformly, providing poor resolution.
- Qwen 2.5-MATH-7B was selected for its **mid-range capabilities**, offering meaningful gradients across a wide spectrum of problems.
## Difficulty Estimation on Other Datasets
We also apply the same difficulty estimation procedure to the following datasets:
- [Open Reasoner Zero](https://huggingface.co/datasets/lime-nlp/orz_math_difficulty)
- [MATH](https://huggingface.co/datasets/lime-nlp/MATH_difficulty)
- [GSM8K](https://huggingface.co/datasets/lime-nlp/GSM8K_difficulty)
## 📬 Contact
For questions or feedback, feel free to reach out to [**Taiwei Shi**](https://maksimstw.github.io/) at [taiweish@usc.edu](mailto:taiweish@usc.edu).
## 📚 Citations
Github: https://github.com/uscnlp-lime/verl
If you find our dataset useful, please cite [Efficient Reinforcement Finetuning via Adaptive Curriculum Learning](https://huggingface.co/papers/2504.05520):
```bibtex
@misc{shi2025efficientreinforcementfinetuningadaptive,
title={Efficient Reinforcement Finetuning via Adaptive Curriculum Learning},
author={Taiwei Shi and Yiyang Wu and Linxin Song and Tianyi Zhou and Jieyu Zhao},
year={2025},
eprint={2504.05520},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2504.05520},
}
```
# DeepScaleR数据集难度估计
本研究基于[Qwen 2.5-MATH-7B模型(Qwen 2.5-MATH-7B)](https://huggingface.co/Qwen/Qwen2.5-Math-7B)的性能表现,为完整的**DeepScaleR数据集(DeepScaleR)**标注了**难度得分**,该标注可为课程构建与模型评估提供自适应信号。
DeepScaleR数据集是一个经过精选的包含40000道推理密集型问题的数据集,用于训练和评估面向大语言模型(Large Language Model, LLM)的强化学习方法。
## 难度评分方法
本研究采用**Qwen 2.5-MATH-7B模型(Qwen 2.5-MATH-7B)**进行难度评分估计,生成参数设置如下:
- 温度参数(temperature)= 0.6
- 核采样概率(top_p)= 0.9
- 最大Token数(max_tokens)= 4096
- 推理过程通过[vLLM](https://github.com/vllm-project/vllm)实现
- 每道问题均进行**128次尝试**
每道问题的难度得分`d_i`计算公式如下:
`d_i = 100 × (1 - 成功次数 / 128)`
该方法可平衡评估信号:
- **强模型**可轻松解决简单问题,压缩难度评分的分布区间
- **弱模型**则会普遍失败,无法提供清晰的分辨率
- 本研究选用Qwen 2.5-MATH-7B模型,正是因其具备**中等性能水平**,可在广泛的问题范围内生成具有区分度的梯度信号
## 其他数据集的难度估计
本研究还将相同的难度估计流程应用于以下数据集:
- [Open Reasoner Zero](https://huggingface.co/datasets/lime-nlp/orz_math_difficulty)
- [MATH](https://huggingface.co/datasets/lime-nlp/MATH_difficulty)
- [GSM8K](https://huggingface.co/datasets/lime-nlp/GSM8K_difficulty)
## 📬 联系方式
如有疑问或反馈,可通过邮箱[taiweish@usc.edu](mailto:taiweish@usc.edu)联系[**史泰伟(Taiwei Shi)**](https://maksimstw.github.io/)。
## 📚 引用
GitHub仓库:https://github.com/uscnlp-lime/verl
若您使用本数据集,请引用论文[《基于自适应课程学习的高效强化微调》(Efficient Reinforcement Finetuning via Adaptive Curriculum Learning)](https://huggingface.co/papers/2504.05520):
bibtex
@misc{shi2025efficientreinforcementfinetuningadaptive,
title={Efficient Reinforcement Finetuning via Adaptive Curriculum Learning},
author={Taiwei Shi and Yiyang Wu and Linxin Song and Tianyi Zhou and Jieyu Zhao},
year={2025},
eprint={2504.05520},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2504.05520},
}
提供机构:
maas
创建时间:
2025-05-23



