OpenR1-Math-220k

Name: OpenR1-Math-220k
Creator: maas
Published: 2026-05-16 19:36:48
License: 暂无描述

魔搭社区2026-05-16 更新2025-02-15 收录

下载链接：

https://modelscope.cn/datasets/open-r1/OpenR1-Math-220k

下载链接

链接失效反馈

官方服务：

资源简介：

# OpenR1-Math-220k ## Dataset description OpenR1-Math-220k is a large-scale dataset for mathematical reasoning. It consists of 220k math problems with two to four reasoning traces generated by [DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) for problems from NuminaMath 1.5. The traces were verified using [Math Verify](https://github.com/huggingface/Math-Verify) for most samples and [Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) as a judge for 12% of the samples, and each problem contains at least one reasoning trace with a correct answer. The dataset consists of two splits: - `default` with 94k problems and that achieves the best performance after SFT. - `extended` with 131k samples where we add data sources like `cn_k12`. This provides more reasoning traces, but we found that the performance after SFT to be lower than the `default` subset, likely because the questions from `cn_k12` are less difficult than other sources. You can load the dataset as follows: ```python from datasets import load_dataset ds = load_dataset("open-r1/OpenR1-Math-220k", "default") ``` ## Dataset curation To build OpenR1-Math-220k, we prompt [DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) model to generate solutions for 400k problems from [NuminaMath 1.5](https://huggingface.co/datasets/AI-MO/NuminaMath-1.5) using [SGLang](https://github.com/sgl-project/sglang), the generation code is available [here](https://github.com/huggingface/open-r1/tree/main/slurm). We follow the model card’s recommended generation parameters and prepend the following instruction to the user prompt: `"Please reason step by step, and put your final answer within \boxed{}."` We set a 16k token limit per generation, as our analysis showed that only 75% of problems could be solved in under 8k tokens, and most of the remaining problems required the full 16k tokens. We were able to generate 25 solutions per hour per H100, enabling us to generate 300k problem solutions per day on 512 H100s. We generate two solutions per problem—and in some cases, four—to provide flexibility in filtering and training. This approach allows for rejection sampling, similar to DeepSeek R1’s methodology, and also makes the dataset suitable for preference optimisation methods like DPO. ## License The dataset is licensed under Apache 2.0

# OpenR1-Math-220k ## 数据集说明 OpenR1-Math-220k是一款面向数学推理任务的大规模数据集。它包含22万个数学题目，附带由[DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1)为NuminaMath 1.5中的题目生成的2至4条推理轨迹。大部分样本通过[Math Verify](https://github.com/huggingface/Math-Verify)完成正确性校验，另有12%的样本以[Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)作为评判模型进行验证；每个题目至少包含一条答案正确的推理轨迹。该数据集包含两个子集： - `default`子集包含9.4万个题目，经监督微调（Supervised Fine-Tuning, SFT）后可取得最优性能。 - `extended`子集包含13.1万个样本，我们额外加入了`cn_k12`等数据源，该子集提供了更多推理轨迹，但经监督微调后的性能低于`default`子集，推测原因是`cn_k12`中的题目难度低于其他数据源。你可以通过如下方式加载该数据集： python from datasets import load_dataset ds = load_dataset("open-r1/OpenR1-Math-220k", "default") ## 数据集构建流程为构建OpenR1-Math-220k，我们使用[SGLang](https://github.com/sgl-project/sglang)，通过提示[DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1)模型为[NuminaMath 1.5](https://huggingface.co/datasets/AI-MO/NuminaMath-1.5)中的40万个题目生成解题方案，相关生成代码已公开于[此处](https://github.com/huggingface/open-r1/tree/main/slurm)。我们遵循模型卡片推荐的生成参数，并在用户提示词前追加以下指令： `"请逐步进行推理，并将最终答案置于\boxed{}中。"` 我们将单轮生成的Token上限设为16k，经分析发现仅有75%的题目可在8k Token内完成求解，剩余绝大多数题目需要完整的16k Token空间。单张H100显卡每小时可生成25个题目的解题方案，在使用512张H100显卡的集群下，我们单日可生成30万个题目的解题方案。我们为每个题目生成2条解题轨迹，部分题目生成4条，以在筛选与训练环节提供灵活性。该方法支持类似DeepSeek R1所用的拒绝采样策略，同时也使该数据集适用于如DPO（Direct Preference Optimization, 直接偏好优化）这类偏好优化方法。 ## 许可证本数据集采用Apache 2.0许可证开源。

提供机构：

maas

创建时间：

2025-02-11

搜集汇总

数据集介绍