OpenR1-Math-Raw
收藏魔搭社区2026-05-03 更新2025-02-15 收录
下载链接:
https://modelscope.cn/datasets/okwinds/OpenR1-Math-Raw
下载链接
链接失效反馈官方服务:
资源简介:
# 本数据集解读,请看公众号文章 👇🏻
### <img src="https://www.modelscope.cn/datasets/okwinds/Human-Like-DPO-Dataset/resolve/master/wechat.png" width="30" height="30" align="absmiddle"> 觉察流 - [Open-R1:深度揭秘 DeepSeek-R1 开源复现进展](https://mp.weixin.qq.com/s/TxRaI8amE_N__1VU4XHvMg)
> <span style="color:red;font-size:16px"> 声明:本数据集完全转载自 Huggingface 上的 [open-r1/OpenR1-Math-Raw](https://huggingface.co/datasets/open-r1/OpenR1-Math-Raw) <br/>更多模型信息,请关注下文👇🏻, 为原数据集仓库的中文版说明。</span>
<br/>
#### _仓库作者在此 👇🏻 扫一扫_
<img src="https://www.modelscope.cn/models/okwinds/GPT-2/resolve/master/qrcode_for_jcl_258.jpg" />
#### 下载方法
数据集文件元信息以及数据文件,请浏览“数据集文件”页面获取。
您可以通过如下GIT Clone命令,或者ModelScope SDK来下载数据集
:modelscope-code[]{type="sdk"}
:modelscope-code[]{type="git"}
# 数据集介绍
# OpenR1-Math-Raw
## Dataset description
OpenR1-Math-Raw is a large-scale dataset for mathematical reasoning. It consists of 516k math problems sourced from [AI-MO/NuminaMath-1.5](https://huggingface.co/datasets/AI-MO/NuminaMath-1.5) with 1 to 8 reasoning traces generated by [DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1).
The traces were verified using [Math Verify](https://github.com/huggingface/Math-Verify), but we recommend additionally annotating the correctness with LLM-as-judge for higher recall.
The dataset contains:
- `516,499` problems
- `1,209,403` R1-generated solutions, with 2.3 solutions per problem on average
- `669,493` solutions verified as correct by [Math Verify](https://github.com/huggingface/Math-Verify)
## Dataset curation
We only keep the solutions that fit in the 16k-token budget, and follow the `<think>...</think>` reasoning format.
Only the non-synthetic problems from NuminaMath-1.5 were used.
For a more curated sample of this dataset and more details please see [open-r1/OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k).
## License
The dataset is licensed under Apache 2.0
# OpenR1-Math-Raw
## 数据集描述
OpenR1-Math-Raw是一款面向数学推理的大规模数据集。其包含516,499道数学题目,数据源自[AI-MO/NuminaMath-1.5](https://huggingface.co/datasets/AI-MO/NuminaMath-1.5),并配套由[DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1)生成的1至8条推理轨迹。
上述推理轨迹已通过[Math Verify](https://github.com/huggingface/Math-Verify)以及基于大语言模型(Large Language Model, LLM)的裁判验证器(Llama-3.3-70B-Instruct)完成校验。
该数据集包含:
- `516,499`道数学题目
- `1,209,403`条由R1生成的解题方案,平均每道题目对应2.3条方案
- 经Llama-3.3-70B-Instruct重新解析的答案(`reparsed_answers`)
其正确答案分布如下:
| 评估指标 | 生成结果正确数 | 生成结果总数 | 题目正确数 | 题目总数 |
|-----------------------------------|----------------|--------------|------------|----------|
| Math Verify 重解析答案 | 679,358 | 944,106 | 266,654 | 376,956 |
| LLaMA 验证 | 602,766 | 944,106 | 308,391 | 376,956 |
| Math Verify 原始答案 | 613,535 | 944,106 | 238,951 | 376,956 |
你可以通过如下方式加载该数据集:
python
from datasets import load_dataset
ds = load_dataset("open-r1/OpenR1-Math-Raw", split="train")
## 数据集整理规范
我们仅保留适配16k Token上下文长度限制的解题方案,且遵循`<think>...</think>`的推理格式。仅使用NuminaMath-1.5中的非合成题目。
如需获取该数据集的精选样本与更多细节,请参阅[open-r1/OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k)。
## 更新日志
### [版本1.1]
- 新增`reparsed_answers`字段,该字段由meta-llama/Meta-Llama-3.3-70B-Instruct生成,通过预设提示词从`solution`字段中提取答案。
- 新增`correctness`字段,该字段包含基于math-verify对`reparsed_answers`/`answer`字段的验证结果,以及基于meta-llama/Meta-Llama-3.3-70B-Instruct的LLM验证结果。
## 许可证
本数据集采用Apache 2.0开源许可证。
提供机构:
maas
创建时间:
2025-02-13



