facebook/natural_reasoning

Name: facebook/natural_reasoning
Creator: maas
Published: 2026-01-02 16:23:55
License: 暂无描述

魔搭社区2026-01-02 更新2025-02-22 收录

下载链接：

https://modelscope.cn/datasets/AI-ModelScope/natural_reasoning

下载链接

链接失效反馈

官方服务：

资源简介：

[NaturalReasoning](https://arxiv.org/abs/2502.13124) is a large-scale dataset for general reasoning tasks. It consists of high-quality challenging reasoning questions backtranslated from pretraining corpora [DCLM](https://github.com/mlfoundations/dclm) and [FineMath](https://huggingface.co/datasets/HuggingFaceTB/finemath). The questions have been deduplicated and decontaminated from popular reasoning benchmarks including MATH, GPQA, MMLU-Pro, MMLU-STEM. For each question, we extract the reference final answer from the original document from the pretraining corpora if possible. We also provide a model-generated response from Llama3.3-70B-Instruct. We release a 1.1 million subset of NaturalReasoning to the research community to foster research on training strong LLM reasoners. You can load the dataset as follows ```python from datasets import load_dataset ds = load_dataset("facebook/natural_reasoning") ``` For more information regarding data collection, please refer to our [paper](https://arxiv.org/abs/2502.13124). ## Reference Answer Statistics In the 1.1 million subset, 18.29% of the questions do not have a reference answer, 9.71% of the questions have a single word answer, 21.58% of the questions have a short answer while 50.42% of the questions have a long reference answer. ## Scaling Curve Training on NaturalReasoning shows better scaling effects than training on other datasets when training Llama3.1-8B-Instruct model. In particular, we measure the average performance on three benchmarks: MATH, GPQA, MMLU-Pro. <img src="https://cdn-uploads.huggingface.co/production/uploads/659a395421a7431643caedda/S6aO-agjRRhc0JLkohZ5z.jpeg" style="width:50%; max-width:400px;"> ## Citation If you use data from NaturalReasoning, please cite with the following BibTex entry: ``` @misc{yuan2025naturalreasoningreasoningwild28m, title={NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions}, author={Weizhe Yuan and Jane Yu and Song Jiang and Karthik Padthe and Yang Li and Dong Wang and Ilia Kulikov and Kyunghyun Cho and Yuandong Tian and Jason E Weston and Xian Li}, year={2025}, eprint={2502.13124}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2502.13124}, } ```

NaturalReasoning（https://arxiv.org/abs/2502.13124）是一款面向通用推理任务的大规模数据集。其高质量挑战性推理问题均从预训练语料库[DCLM（https://github.com/mlfoundations/dclm）]与[FineMath（https://huggingface.co/datasets/HuggingFaceTB/finemath）]中回译得到。所有问题均已完成去重与去污染处理，且未与MATH、GPQA、MMLU-Pro、MMLU-STEM等主流推理基准数据集存在重叠。针对每个问题，我们尽可能从预训练语料库的原始文档中提取参考最终答案；同时还提供了由Llama3.3-70B-Instruct生成的模型响应。我们面向学术社区发布了NaturalReasoning的110万条样本子集，以推动高性能大语言模型（Large Language Model）推理相关研究的发展。可通过以下代码加载该数据集： python from datasets import load_dataset ds = load_dataset("facebook/natural_reasoning") 若需了解更多数据收集相关细节，请参阅我们的论文（https://arxiv.org/abs/2502.13124）。 ## 参考答案统计情况在该110万条样本的子集中，18.29%的问题未配备参考答案，9.71%的问题仅含单个词的答案，21.58%的问题为短答案，另有50.42%的问题拥有长参考答案。 ## 缩放曲线在训练Llama3.1-8B-Instruct模型时，基于NaturalReasoning进行训练的缩放效果优于其他数据集。具体而言，我们以MATH、GPQA、MMLU-Pro三个基准的平均性能作为评估指标。 <img src="https://cdn-uploads.huggingface.co/production/uploads/659a395421a7431643caedda/S6aO-agjRRhc0JLkohZ5z.jpeg" style="width:50%; max-width:400px;"> ## 引用格式若您在研究中使用NaturalReasoning数据集，请按以下BibTex格式引用： @misc{yuan2025naturalreasoningreasoningwild28m, title={NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions}, author={Weizhe Yuan and Jane Yu and Song Jiang and Karthik Padthe and Yang Li and Dong Wang and Ilia Kulikov and Kyunghyun Cho and Yuandong Tian and Jason E Weston and Xian Li}, year={2025}, eprint={2502.13124}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2502.13124}, }

提供机构：

maas

创建时间：

2025-02-21

搜集汇总

数据集介绍