Nemotron-CrossThink

Name: Nemotron-CrossThink
Creator: maas
Published: 2025-12-04 09:19:27
License: 暂无描述

魔搭社区2025-12-04 更新2025-05-03 收录

下载链接：

https://modelscope.cn/datasets/nv-community/Nemotron-CrossThink

下载链接

链接失效反馈

官方服务：

资源简介：

# Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning **Author: Syeda Nahida Akter, Shrimai Prabhumoye, Matvei Novikov, Seungju Han, Ying Lin, Evelina Bakhturina, Eric Nyberg, Yejin Choi, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro** [[Paper]](https://arxiv.org/abs/2504.13941)[[Blog]](https://research.nvidia.com/labs/adlr/Nemotron-CrossThink/) ## Dataset Description **Nemotron-CrossThink** is a multi-domain reinforcement learning (RL) dataset designed to improve general-purpose and mathematical reasoning in large language models (LLMs). The dataset contains high-quality question-answer pairs with detailed reasoning traces, curated and synthesized from CommonCrawl and high-quality books. Inspired by techniques in MMLU-Pro and PersonaMath, Nemotron-CrossThink focuses on building diverse and verifiable reasoning examples across STEM, humanities, and mathematical problem-solving domains. This dataset is ready for commercial/non-commercial use. <img class="img-full" src="figures/method_brief.png"> **Figure 1: Nemotron-CrossThink.** *We (a) curate question-answer (QA) pairs from synthetic (Common Crawl) and open-source datasets, categorized into general-purpose reasoning and math reasoning; (b) apply structured templates to convert data into multiple-choice (MCQ) and open-ended formats, promoting diverse reasoning trajectories; (c) filter out unverifiable or ill-formatted responses; (d) train an RL policy using Group Relative Policy Optimization (GRPO). The final reward is used to update the policy, iteratively improving the model’s reasoning capabilities across diverse domains.* #### Composition of Nemotron-CrossThink Nemotron-CrossThink consists of two major components: - **Nemotron-CrossThink-QA:** Question-answer pairs constructed from raw CommonCrawl and open-domain books using category-specific templates inspired by MMLU-Pro. These samples cover a wide range of disciplines including physics, law, social science, and economics, following both multiple-choice and open-ended formats. - **Nemotron-CrossThink-Math:** We take inspiration from [PersonaMath](https://arxiv.org/pdf/2406.20094v1) to generate diverse math problems generated by extracting persona from CommonCrawl and prompting models to synthesize math problems of certain skills. We extract the skills from existing math benchmarks and diversify them by applying different persona. This subset emphasizes multi-step symbolic reasoning and chain-of-thought generation. #### Data Preparation - **Multi-domain Curation:** We gather diverse reasoning data from CommonCrawl and open QA benchmarks, covering both symbolic and contextual domains. - **Template Standardization:** Structured templates (MCQ, open-ended) are applied to unify question/answer formats and enable verifiable reward modeling. - **Filtering for Verifiability:** We remove unverifiable samples (e.g., overly long answers, invalid MCQs) to ensure stable RL training. - **Data Blending:** We design blends of math and general-purpose reasoning data to study their combined effect on model generalization. Nemotron-CrossThink dataset contains the following fields: - **data_source:** Nemotron-CrossThink-QA or Nemotron-CrossThink-Math - **prompt:** Contains generic instruction along with the problem - **reward_model:** Consists of ground truth solution and evaluation style - **meta_data:** Contains index of the data sample and split type (train/test). For math version, we also include persona and skills that have been used to curate the data #### Key Insights: - Nemotron-CrossThink enables scalable and verifiable reward modeling beyond mathematics and demonstrates improved accuracies (See Figure 2 on the left) on both math (**Math-500: +30.1%, AMC: +27.5%**) and non-math reasoning benchmarks (**MMLU-Pro: +12.8%, GPQA-Diamond: +11.3%, AGIEval: +15.1%, SuperGPQA: +3.8%**). - Moreover, Nemotron-CrossThink exhibits significantly improved response efficiency, generating correct answers using **28% fewer tokens** on average, highlighting more focused and effective reasoning (See Figure 2 on the right). <p align="center"> <img class="img-full" src="figures/summary_accuracy_comparison.png" width="40%" style="display:inline-block; margin-right: 10px;"> <img class="img-full" src="figures/token_efficiency.png" width="40%" style="display:inline-block;" > </p> **Figure 2: (left)** *Employing self-learning with multi-domain data, Nemotron-CrossThink outperforms baseline models, including domain-specific training (Only Math) and Open-Reasoner-Zero (ORZ-7B), achieving consistent gains across all reasoning tasks.* **(right)** *Token efficiency comparison of models trained on Nemotron-CrossThink (multi-domain blend) and two single domain blends (Only Math and ORZ).* ## Dataset Owner(s): NVIDIA Corporation ## Dataset Creation Date: September 20, 2024 ## License/Terms of Use: Governing Terms: This dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0) available at https://creativecommons.org/licenses/by/4.0/legalcode. This dataset contains synthetic data created using Qwen/Qwen2.5-Math-72B, Qwen2.5-72B-Instruct. If this dataset is used to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, such AI model may be subject to redistribution and use requirements in the Qwen License Agreement (https://huggingface.co/Qwen/Qwen2.5-Math-72B/blob/main/LICENSE and https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE). ## Data Developer: NVIDIA ## Intended Usage: The Nemotron-CrossThink Dataset is intended to be used by the community to deploy reinforcement learning with LLMs. The data may be used to train and evaluate. ## Data Version: - v1 ## Dataset Characterization - Data Collection Method: Synthetic - Labeling Method: Automated ## Dataset Format Text ## Dataset Quantification - Record Count: 287,376 QA pairs - Feature Count: 2. We have two domains in the data. They are: (1) Nemotron-CrossThink-QA, and (2) Nemotron-CrossThink-Math. - Total Data Storage: 638MB ## Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/). ## Citation <pre> @misc{akter2025nemotroncrossthinkscalingselflearningmath, title={NEMOTRON-CROSSTHINK: Scaling Self-Learning beyond Math Reasoning}, author={Syeda Nahida Akter and Shrimai Prabhumoye and Matvei Novikov and Seungju Han and Ying Lin and Evelina Bakhturina and Eric Nyberg and Yejin Choi and Mostofa Patwary and Mohammad Shoeybi and Bryan Catanzaro}, year={2025}, eprint={2504.13941}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2504.13941}, } </pre>

# Nemotron-CrossThink：将自学习拓展至数学推理之外的领域 **作者：Syeda Nahida Akter、Shrimai Prabhumoye、Matvei Novikov、Seungju Han、Ying Lin、Evelina Bakhturina、Eric Nyberg、Yejin Choi、Mostofa Patwary、Mohammad Shoeybi、Bryan Catanzaro** [[论文]](https://arxiv.org/abs/2504.13941)[[博客]](https://research.nvidia.com/labs/adlr/Nemotron-CrossThink/) ## 数据集描述 **Nemotron-CrossThink** 是一款多领域强化学习（reinforcement learning, RL）数据集，旨在提升大语言模型（large language model, LLM）的通用推理与数学推理能力。该数据集包含高质量问答对与详细推理轨迹，从CommonCrawl与优质书籍中精选并合成而来。其设计灵感源自MMLU-Pro与PersonaMath相关技术，聚焦于构建覆盖科学、技术、工程、数学（STEM）、人文以及数学解题领域的多样化可验证推理样本。本数据集可用于商业与非商业用途。 <img class="img-full" src="figures/method_brief.png"> **图1：Nemotron-CrossThink工作流程**。*我们的流程为：(a) 从合成数据（Common Crawl）与开源数据集中精选问答（QA）对，分为通用推理与数学推理两类；(b) 应用结构化模板将数据转换为多项选择（multiple-choice question, MCQ）与开放式问答格式，以促进多样化推理路径的生成；(c) 过滤掉无法验证或格式错误的回复；(d) 采用组相对策略优化（Group Relative Policy Optimization, GRPO）训练强化学习策略。最终通过奖励更新模型策略，迭代提升模型在多领域的推理能力。* #### Nemotron-CrossThink的构成 Nemotron-CrossThink包含两大核心组件： - **Nemotron-CrossThink-QA**：采用受MMLU-Pro启发的分类专属模板，从原始CommonCrawl与开放域书籍中构建的问答对。此类样本覆盖物理、法学、社会科学、经济学等广泛学科，同时支持多项选择与开放式问答两种格式。 - **Nemotron-CrossThink-Math**：我们借鉴[PersonaMath](https://arxiv.org/pdf/2406.20094v1)的方法，通过从CommonCrawl中提取人物角色，并提示模型生成特定技能的数学问题，以此构建多样化的数学题集。我们从现有数学基准中提取技能，并通过引入不同人物角色实现样本多样化。该子集重点关注多步符号推理与思维链（chain-of-thought）生成。 #### 数据预处理流程 - **多领域数据精选**：从CommonCrawl与开放QA基准中收集多样化推理数据，覆盖符号与上下文两类领域。 - **模板标准化**：采用结构化模板（多项选择、开放式问答）统一问答格式，为可验证的奖励建模提供支撑。 - **可验证性过滤**：移除无法验证的样本（例如过长的答案、无效的多项选择题），以确保强化学习训练的稳定性。 - **数据混合**：设计数学与通用推理数据的混合方案，以研究二者结合对模型泛化能力的影响。 Nemotron-CrossThink数据集包含以下字段： - **data_source**：数据集来源，可选值为Nemotron-CrossThink-QA或Nemotron-CrossThink-Math - **prompt**：包含通用指令与问题本身的文本 - **reward_model**：包含标准答案与评估范式的模块 - **meta_data**：包含数据样本索引与划分类型（训练集/测试集）。对于数学子集，额外包含用于构建数据的人物角色与技能信息。 #### 核心研究结论 - Nemotron-CrossThink支持超越数学领域的可扩展、可验证奖励建模，并在两类基准上均实现了性能提升（详见左侧图2）：数学基准（**Math-500：提升30.1%，AMC：提升27.5%**）与非数学推理基准（**MMLU-Pro：提升12.8%，GPQA-Diamond：提升11.3%，AGIEval：提升15.1%，SuperGPQA：提升3.8%**）。 - 此外，Nemotron-CrossThink显著提升了响应效率，平均可减少**28%的Token**生成量以获得正确答案，体现出更聚焦且高效的推理过程（详见右侧图2）。 <p align="center"> <img class="img-full" src="figures/summary_accuracy_comparison.png" width="40%" style="display:inline-block; margin-right: 10px;"> <img class="img-full" src="figures/token_efficiency.png" width="40%" style="display:inline-block;" > </p> **图2：（左）** *借助多领域数据开展自学习，Nemotron-CrossThink的性能优于基线模型，包括仅针对数学领域训练的模型与Open-Reasoner-Zero（ORZ-7B），在所有推理任务上均实现了稳定的性能提升。* **（右）** *对比了基于Nemotron-CrossThink（多领域混合数据）与两类单领域混合数据（仅数学数据与ORZ）训练的模型的Token使用效率。* ## 数据集所有者：NVIDIA公司 ## 数据集创建日期：2024年9月20日 ## 使用许可条款：本数据集采用知识共享署名4.0国际许可协议（Creative Commons Attribution 4.0 International License, CC BY 4.0），详情请参见：https://creativecommons.org/licenses/by/4.0/legalcode。本数据集包含使用Qwen/Qwen2.5-Math-72B、Qwen2.5-72B-Instruct生成的合成数据。若使用本数据集创建、训练、微调或以其他方式改进AI模型并进行分发或提供使用，则该AI模型需遵守Qwen许可协议（https://huggingface.co/Qwen/Qwen2.5-Math-72B/blob/main/LICENSE 与 https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE）中的再分发与使用要求。 ## 数据开发者：NVIDIA ## 预期用途： Nemotron-CrossThink数据集旨在供社区用于开展大语言模型的强化学习研究，可用于模型训练与评估。 ## 数据集版本： - v1 ## 数据集特征描述 - 数据收集方式：合成生成 - 标注方式：自动化标注 ## 数据集格式：文本 ## 数据集量化信息 - 样本数量：287,376组问答对 - 特征维度：2个。本数据集包含两个领域：(1) Nemotron-CrossThink-QA，(2) Nemotron-CrossThink-Math。 - 总存储容量：638MB ## 伦理考量： NVIDIA坚信可信人工智能是一项共同责任，我们已制定相关政策与实践规范，以支撑各类AI应用的开发。若用户按照本服务条款下载或使用本数据集，开发者应与其内部模型团队协作，确保该模型符合相关行业与应用场景的要求，并防范潜在的产品误用风险。请通过[此链接](https://www.nvidia.com/en-us/support/submit-security-vulnerability/)报告安全漏洞或NVIDIA人工智能相关问题。 ## 引用格式 <pre> @misc{akter2025nemotroncrossthinkscalingselflearningmath, title={NEMOTRON-CROSSTHINK: Scaling Self-Learning beyond Math Reasoning}, author={Syeda Nahida Akter and Shrimai Prabhumoye and Matvei Novikov and Seungju Han and Ying Lin and Evelina Bakhturina and Eric Nyberg and Yejin Choi and Mostofa Patwary and Mohammad Shoeybi and Bryan Catanzaro}, year={2025}, eprint={2504.13941}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2504.13941}, } </pre>

提供机构：

maas

创建时间：

2025-05-01

搜集汇总

数据集介绍