five

open-deepscaler

收藏
魔搭社区2025-11-07 更新2025-03-29 收录
下载链接:
https://modelscope.cn/datasets/knoveleng/open-deepscaler
下载链接
链接失效反馈
官方服务:
资源简介:
# Open-DeepScaleR Dataset ## Dataset Description - **Repository**: [knoveleng/open-rs](https://github.com/knoveleng/open-rs) - **Paper**: [Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn’t](https://arxiv.org/abs/2503.16219) ### Summary The `open-deepscaler` dataset comprises 21,044 challenging mathematical reasoning problems, sourced from the [DeepScaleR dataset](https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset). It supports the [Open RS project](https://github.com/knoveleng/open-rs), enhancing reasoning in small LLMs via reinforcement learning. ## Usage Load the dataset using the Hugging Face `datasets` library: ```python from datasets import load_dataset ds = load_dataset("knoveleng/open-deepscaler")["train"] print(ds[0]) ``` ## Dataset Structure ### Data Instance An example entry: ```json { "problem": "Doug constructs a square window using 8 equal-size panes...", "solution": "1. Identify pane dimensions: Let each pane be a square with side length \(s\). ...", "answer": "26", "gold_parsed": "[26, '26']", "response": "To find the side length, consider the total area split into 8 panes...", "answer_parsed": "[50/3, '\\frac{50}{3}']", "reward": 0, "level": "Hard" } ``` ### Data Fields - **`problem`**: Mathematical question (string). - **`solution`**: Detailed solution steps (string). - **`answer`**: Correct final answer (string). - **`gold_parsed`**: Correct answer in LaTeX format, parsed by [math_verify](https://github.com/huggingface/Math-Verify) (string). - **`response`**: Incorrect response from Qwen2.5-Math-7B-Instruct model (string). - **`answer_parsed`**: Incorrect answer in LaTeX format, parsed by [math_verify](https://github.com/huggingface/Math-Verify) (string). - **`reward`**: Reward score (float64); `0` indicates failure by Qwen2.5-Math-7B-Instruct. - **`level`**: Difficulty level (string); "Hard" corresponds to `reward = 0`. ## Citation ```bibtex @misc{dang2025reinforcementlearningreasoningsmall, title={Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't}, author={Quy-Anh Dang and Chris Ngo}, year={2025}, eprint={2503.16219}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2503.16219}, } ```

# Open-DeepScaleR 数据集 ## 数据集说明 - **存储仓库**:[knoveleng/open-rs](https://github.com/knoveleng/open-rs) - **相关论文**:[面向小型大语言模型推理的强化学习:有效方法与失效场景](https://arxiv.org/abs/2503.16219) ### 概述 `open-deepscaler` 数据集包含21044道高难度数学推理题,数据源源自 [DeepScaleR 数据集](https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset)。本数据集支持 [Open RS 项目](https://github.com/knoveleng/open-rs),旨在通过强化学习增强小型大语言模型(Large Language Model,LLM)的推理能力。 ## 使用方法 使用 Hugging Face `datasets` 库加载该数据集: python from datasets import load_dataset ds = load_dataset("knoveleng/open-deepscaler")["train"] print(ds[0]) ## 数据集结构 ### 数据实例 示例条目如下: json { "problem": "Doug constructs a square window using 8 equal-size panes...", "solution": "1. Identify pane dimensions: Let each pane be a square with side length (s). ...", "answer": "26", "gold_parsed": "[26, '26']", "response": "To find the side length, consider the total area split into 8 panes...", "answer_parsed": "[50/3, '\frac{50}{3}']", "reward": 0, "level": "Hard" } ### 数据字段 - **`problem`**:数学问题(字符串类型)。 - **`solution`**:详细解题步骤(字符串类型)。 - **`answer`**:正确最终答案(字符串类型)。 - **`gold_parsed`**:由 [math_verify](https://github.com/huggingface/Math-Verify) 工具解析的LaTeX格式正确答案(字符串类型)。 - **`response`**:Qwen2.5-Math-7B-Instruct 模型生成的错误回复(字符串类型)。 - **`answer_parsed`**:由 [math_verify](https://github.com/huggingface/Math-Verify) 工具解析的LaTeX格式错误答案(字符串类型)。 - **`reward`**:奖励分数(float64 类型);值为`0`表示 Qwen2.5-Math-7B-Instruct 模型推理失败。 - **`level`**:难度等级(字符串类型);“Hard”对应`reward = 0`的样本。 ## 引用格式 bibtex @misc{dang2025reinforcementlearningreasoningsmall, title={Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't}, author={Quy-Anh Dang and Chris Ngo}, year={2025}, eprint={2503.16219}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2503.16219}, }
提供机构:
maas
创建时间:
2025-03-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作