open-deepscaler

Name: open-deepscaler
Creator: maas
Published: 2025-11-07 16:28:05
License: 暂无描述

魔搭社区2025-11-07 更新2025-03-29 收录

下载链接：

https://modelscope.cn/datasets/knoveleng/open-deepscaler

下载链接

链接失效反馈

官方服务：

资源简介：

# Open-DeepScaleR Dataset ## Dataset Description - **Repository**: [knoveleng/open-rs](https://github.com/knoveleng/open-rs) - **Paper**: [Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn’t](https://arxiv.org/abs/2503.16219) ### Summary The `open-deepscaler` dataset comprises 21,044 challenging mathematical reasoning problems, sourced from the [DeepScaleR dataset](https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset). It supports the [Open RS project](https://github.com/knoveleng/open-rs), enhancing reasoning in small LLMs via reinforcement learning. ## Usage Load the dataset using the Hugging Face `datasets` library: ```python from datasets import load_dataset ds = load_dataset("knoveleng/open-deepscaler")["train"] print(ds[0]) ``` ## Dataset Structure ### Data Instance An example entry: ```json { "problem": "Doug constructs a square window using 8 equal-size panes...", "solution": "1. Identify pane dimensions: Let each pane be a square with side length \(s\). ...", "answer": "26", "gold_parsed": "[26, '26']", "response": "To find the side length, consider the total area split into 8 panes...", "answer_parsed": "[50/3, '\\frac{50}{3}']", "reward": 0, "level": "Hard" } ``` ### Data Fields - **`problem`**: Mathematical question (string). - **`solution`**: Detailed solution steps (string). - **`answer`**: Correct final answer (string). - **`gold_parsed`**: Correct answer in LaTeX format, parsed by [math_verify](https://github.com/huggingface/Math-Verify) (string). - **`response`**: Incorrect response from Qwen2.5-Math-7B-Instruct model (string). - **`answer_parsed`**: Incorrect answer in LaTeX format, parsed by [math_verify](https://github.com/huggingface/Math-Verify) (string). - **`reward`**: Reward score (float64); `0` indicates failure by Qwen2.5-Math-7B-Instruct. - **`level`**: Difficulty level (string); "Hard" corresponds to `reward = 0`. ## Citation ```bibtex @misc{dang2025reinforcementlearningreasoningsmall, title={Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't}, author={Quy-Anh Dang and Chris Ngo}, year={2025}, eprint={2503.16219}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2503.16219}, } ```

# Open-DeepScaleR 数据集 ## 数据集说明 - **存储仓库**：[knoveleng/open-rs](https://github.com/knoveleng/open-rs) - **相关论文**：[面向小型大语言模型推理的强化学习：有效方法与失效场景](https://arxiv.org/abs/2503.16219) ### 概述 `open-deepscaler` 数据集包含21044道高难度数学推理题，数据源源自 [DeepScaleR 数据集](https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset)。本数据集支持 [Open RS 项目](https://github.com/knoveleng/open-rs)，旨在通过强化学习增强小型大语言模型（Large Language Model，LLM）的推理能力。 ## 使用方法使用 Hugging Face `datasets` 库加载该数据集： python from datasets import load_dataset ds = load_dataset("knoveleng/open-deepscaler")["train"] print(ds[0]) ## 数据集结构 ### 数据实例示例条目如下： json { "problem": "Doug constructs a square window using 8 equal-size panes...", "solution": "1. Identify pane dimensions: Let each pane be a square with side length (s). ...", "answer": "26", "gold_parsed": "[26, '26']", "response": "To find the side length, consider the total area split into 8 panes...", "answer_parsed": "[50/3, '\frac{50}{3}']", "reward": 0, "level": "Hard" } ### 数据字段 - **`problem`**：数学问题（字符串类型）。 - **`solution`**：详细解题步骤（字符串类型）。 - **`answer`**：正确最终答案（字符串类型）。 - **`gold_parsed`**：由 [math_verify](https://github.com/huggingface/Math-Verify) 工具解析的LaTeX格式正确答案（字符串类型）。 - **`response`**：Qwen2.5-Math-7B-Instruct 模型生成的错误回复（字符串类型）。 - **`answer_parsed`**：由 [math_verify](https://github.com/huggingface/Math-Verify) 工具解析的LaTeX格式错误答案（字符串类型）。 - **`reward`**：奖励分数（float64 类型）；值为`0`表示 Qwen2.5-Math-7B-Instruct 模型推理失败。 - **`level`**：难度等级（字符串类型）；“Hard”对应`reward = 0`的样本。 ## 引用格式 bibtex @misc{dang2025reinforcementlearningreasoningsmall, title={Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't}, author={Quy-Anh Dang and Chris Ngo}, year={2025}, eprint={2503.16219}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2503.16219}, }

提供机构：

maas

创建时间：

2025-03-27

搜集汇总

数据集介绍