OpenR1-Math-Raw

Name: OpenR1-Math-Raw
Creator: maas
Published: 2026-05-03 20:03:14
License: 暂无描述

魔搭社区2026-05-03 更新2025-02-15 收录

下载链接：

https://modelscope.cn/datasets/okwinds/OpenR1-Math-Raw

下载链接

链接失效反馈

官方服务：

资源简介：

# 本数据集解读，请看公众号文章 👇🏻 ### <img src="https://www.modelscope.cn/datasets/okwinds/Human-Like-DPO-Dataset/resolve/master/wechat.png" width="30" height="30" align="absmiddle"> 觉察流 - [Open-R1：深度揭秘 DeepSeek-R1 开源复现进展](https://mp.weixin.qq.com/s/TxRaI8amE_N__1VU4XHvMg) > <span style="color:red;font-size:16px"> 声明：本数据集完全转载自 Huggingface 上的 [open-r1/OpenR1-Math-Raw](https://huggingface.co/datasets/open-r1/OpenR1-Math-Raw) <br/>更多模型信息，请关注下文👇🏻，为原数据集仓库的中文版说明。</span> <br/> #### _仓库作者在此 👇🏻 扫一扫_ <img src="https://www.modelscope.cn/models/okwinds/GPT-2/resolve/master/qrcode_for_jcl_258.jpg" /> #### 下载方法数据集文件元信息以及数据文件，请浏览“数据集文件”页面获取。您可以通过如下GIT Clone命令，或者ModelScope SDK来下载数据集 :modelscope-code[]{type="sdk"} :modelscope-code[]{type="git"} # 数据集介绍 # OpenR1-Math-Raw ## Dataset description OpenR1-Math-Raw is a large-scale dataset for mathematical reasoning. It consists of 516k math problems sourced from [AI-MO/NuminaMath-1.5](https://huggingface.co/datasets/AI-MO/NuminaMath-1.5) with 1 to 8 reasoning traces generated by [DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1). The traces were verified using [Math Verify](https://github.com/huggingface/Math-Verify), but we recommend additionally annotating the correctness with LLM-as-judge for higher recall. The dataset contains: - `516,499` problems - `1,209,403` R1-generated solutions, with 2.3 solutions per problem on average - `669,493` solutions verified as correct by [Math Verify](https://github.com/huggingface/Math-Verify) ## Dataset curation We only keep the solutions that fit in the 16k-token budget, and follow the `<think>...</think>` reasoning format. Only the non-synthetic problems from NuminaMath-1.5 were used. For a more curated sample of this dataset and more details please see [open-r1/OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k). ## License The dataset is licensed under Apache 2.0

# OpenR1-Math-Raw ## 数据集描述 OpenR1-Math-Raw是一款面向数学推理的大规模数据集。其包含516,499道数学题目，数据源自[AI-MO/NuminaMath-1.5](https://huggingface.co/datasets/AI-MO/NuminaMath-1.5)，并配套由[DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1)生成的1至8条推理轨迹。上述推理轨迹已通过[Math Verify](https://github.com/huggingface/Math-Verify)以及基于大语言模型（Large Language Model, LLM）的裁判验证器（Llama-3.3-70B-Instruct）完成校验。该数据集包含： - `516,499`道数学题目 - `1,209,403`条由R1生成的解题方案，平均每道题目对应2.3条方案 - 经Llama-3.3-70B-Instruct重新解析的答案（`reparsed_answers`）其正确答案分布如下： | 评估指标 | 生成结果正确数 | 生成结果总数 | 题目正确数 | 题目总数 | |-----------------------------------|----------------|--------------|------------|----------| | Math Verify 重解析答案 | 679,358 | 944,106 | 266,654 | 376,956 | | LLaMA 验证 | 602,766 | 944,106 | 308,391 | 376,956 | | Math Verify 原始答案 | 613,535 | 944,106 | 238,951 | 376,956 | 你可以通过如下方式加载该数据集： python from datasets import load_dataset ds = load_dataset("open-r1/OpenR1-Math-Raw", split="train") ## 数据集整理规范我们仅保留适配16k Token上下文长度限制的解题方案，且遵循`<think>...</think>`的推理格式。仅使用NuminaMath-1.5中的非合成题目。如需获取该数据集的精选样本与更多细节，请参阅[open-r1/OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k)。 ## 更新日志 ### [版本1.1] - 新增`reparsed_answers`字段，该字段由meta-llama/Meta-Llama-3.3-70B-Instruct生成，通过预设提示词从`solution`字段中提取答案。 - 新增`correctness`字段，该字段包含基于math-verify对`reparsed_answers`/`answer`字段的验证结果，以及基于meta-llama/Meta-Llama-3.3-70B-Instruct的LLM验证结果。 ## 许可证本数据集采用Apache 2.0开源许可证。

提供机构：

maas

创建时间：

2025-02-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集