open-deepscaler
收藏魔搭社区2025-11-07 更新2025-03-29 收录
下载链接:
https://modelscope.cn/datasets/knoveleng/open-deepscaler
下载链接
链接失效反馈官方服务:
资源简介:
# Open-DeepScaleR Dataset
## Dataset Description
- **Repository**: [knoveleng/open-rs](https://github.com/knoveleng/open-rs)
- **Paper**: [Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn’t](https://arxiv.org/abs/2503.16219)
### Summary
The `open-deepscaler` dataset comprises 21,044 challenging mathematical reasoning problems, sourced from the [DeepScaleR dataset](https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset). It supports the [Open RS project](https://github.com/knoveleng/open-rs), enhancing reasoning in small LLMs via reinforcement learning.
## Usage
Load the dataset using the Hugging Face `datasets` library:
```python
from datasets import load_dataset
ds = load_dataset("knoveleng/open-deepscaler")["train"]
print(ds[0])
```
## Dataset Structure
### Data Instance
An example entry:
```json
{
"problem": "Doug constructs a square window using 8 equal-size panes...",
"solution": "1. Identify pane dimensions: Let each pane be a square with side length \(s\). ...",
"answer": "26",
"gold_parsed": "[26, '26']",
"response": "To find the side length, consider the total area split into 8 panes...",
"answer_parsed": "[50/3, '\\frac{50}{3}']",
"reward": 0,
"level": "Hard"
}
```
### Data Fields
- **`problem`**: Mathematical question (string).
- **`solution`**: Detailed solution steps (string).
- **`answer`**: Correct final answer (string).
- **`gold_parsed`**: Correct answer in LaTeX format, parsed by [math_verify](https://github.com/huggingface/Math-Verify) (string).
- **`response`**: Incorrect response from Qwen2.5-Math-7B-Instruct model (string).
- **`answer_parsed`**: Incorrect answer in LaTeX format, parsed by [math_verify](https://github.com/huggingface/Math-Verify) (string).
- **`reward`**: Reward score (float64); `0` indicates failure by Qwen2.5-Math-7B-Instruct.
- **`level`**: Difficulty level (string); "Hard" corresponds to `reward = 0`.
## Citation
```bibtex
@misc{dang2025reinforcementlearningreasoningsmall,
title={Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't},
author={Quy-Anh Dang and Chris Ngo},
year={2025},
eprint={2503.16219},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2503.16219},
}
```
# Open-DeepScaleR 数据集
## 数据集说明
- **存储仓库**:[knoveleng/open-rs](https://github.com/knoveleng/open-rs)
- **相关论文**:[面向小型大语言模型推理的强化学习:有效方法与失效场景](https://arxiv.org/abs/2503.16219)
### 概述
`open-deepscaler` 数据集包含21044道高难度数学推理题,数据源源自 [DeepScaleR 数据集](https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset)。本数据集支持 [Open RS 项目](https://github.com/knoveleng/open-rs),旨在通过强化学习增强小型大语言模型(Large Language Model,LLM)的推理能力。
## 使用方法
使用 Hugging Face `datasets` 库加载该数据集:
python
from datasets import load_dataset
ds = load_dataset("knoveleng/open-deepscaler")["train"]
print(ds[0])
## 数据集结构
### 数据实例
示例条目如下:
json
{
"problem": "Doug constructs a square window using 8 equal-size panes...",
"solution": "1. Identify pane dimensions: Let each pane be a square with side length (s). ...",
"answer": "26",
"gold_parsed": "[26, '26']",
"response": "To find the side length, consider the total area split into 8 panes...",
"answer_parsed": "[50/3, '\frac{50}{3}']",
"reward": 0,
"level": "Hard"
}
### 数据字段
- **`problem`**:数学问题(字符串类型)。
- **`solution`**:详细解题步骤(字符串类型)。
- **`answer`**:正确最终答案(字符串类型)。
- **`gold_parsed`**:由 [math_verify](https://github.com/huggingface/Math-Verify) 工具解析的LaTeX格式正确答案(字符串类型)。
- **`response`**:Qwen2.5-Math-7B-Instruct 模型生成的错误回复(字符串类型)。
- **`answer_parsed`**:由 [math_verify](https://github.com/huggingface/Math-Verify) 工具解析的LaTeX格式错误答案(字符串类型)。
- **`reward`**:奖励分数(float64 类型);值为`0`表示 Qwen2.5-Math-7B-Instruct 模型推理失败。
- **`level`**:难度等级(字符串类型);“Hard”对应`reward = 0`的样本。
## 引用格式
bibtex
@misc{dang2025reinforcementlearningreasoningsmall,
title={Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't},
author={Quy-Anh Dang and Chris Ngo},
year={2025},
eprint={2503.16219},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2503.16219},
}
提供机构:
maas
创建时间:
2025-03-27



