five

WebInstruct-verified

收藏
魔搭社区2026-01-09 更新2025-04-26 收录
下载链接:
https://modelscope.cn/datasets/TIGER-Lab/WebInstruct-verified
下载链接
链接失效反馈
官方服务:
资源简介:
# General-Reasoner: Advancing LLM Reasoning Across All Domains <p align="center"> <a href="https://github.com/TIGER-AI-Lab/General-Reasoner" target="_blank">💻 Code</a> | <a href="https://arxiv.org/abs/2505.14652" target="_blank">📄 Paper</a> | <a href="https://huggingface.co/datasets/TIGER-Lab/WebInstruct-verified" target="_blank">📊 Dataset</a> | <a href="https://huggingface.co/collections/TIGER-Lab/general-reasoner-67fe9386e43e046489eac013" target="_blank">🤗 Model</a> | <a href="https://tiger-ai-lab.github.io/General-Reasoner/" target="_blank">🌐 Project Page</a> </p> ## Overview <p align="center"> <img src="https://tiger-ai-lab.github.io/General-Reasoner/static/images/teaser.png" alt="General-Reasoner Teaser" width="650"/> </p> <p align="center" style="font-style: italic; font-size: 0.95rem;"> <em> Figure: Effectiveness of <strong>General-Reasoner</strong> trained with diverse verifiable reasoning questions using model-based verifier compared to baseline methods on various reasoning tasks. </em> </p> **General-Reasoner** is a training paradigm for large language models (LLMs), designed to robustly enhance reasoning abilities across diverse domains—not just mathematics and coding, but also physics, chemistry, finance, humanities, and more. **Key features:** - **Zero RL Training:** Direct reinforcement learning from base LLMs, bypassing intermediate supervised stages. - **Diverse Reasoning Data:** 230K+ high-quality, verifiable questions sourced from the web and filtered for answer verifiability across disciplines. - **Model-Based Verifier:** Compact 1.5B generative verifier model for context-aware, chain-of-thought answer validation, outperforming traditional rule-based methods. **This repo contains the Diverse Reasoning Data WebInstruct-verified ** ## Dataset Details We construct a diverse, high‑quality dataset to facilitate robust reasoning capabilities across a broad range of domains, extending beyond the commonly studied mathematical problems. - **We trace back the data in WebInstruct to its original web page to re‑crawl the question–answer pairs.** If the original page lacks human‑written answers, we drop the entry. This ensures every re‑crawled item is human‑verified and, therefore, that each answer is of reliable quality. - **Gemini‑1.5‑Pro is employed to selectively extract questions with clearly verifiable short answers,** further boosting dataset reliability. - **Gemini‑2.0‑Flash then generates eight candidate answers per question for additional filtering:** - We discard any question for which **all eight Gemini‑generated answers are incorrect**, eliminating ambiguous or noisy items that arose during web scraping. - We also remove **overly simple questions**—those for which **all eight candidate answers are correct**—to preserve dataset complexity and better challenge model generalization. These steps ensure the correctness of the constructed dataset. ## Distribution The distribution of disciplines is depicted as follows: <img src="https://cdn-uploads.huggingface.co/production/uploads/6313a86154e6e5d9f0f94e04/I_TplgIibmBM_A_nwZh7B.png" width="600"/> ## Verification The short answers have different forms, including float, array, matrix, latex, etc. To verify these answers, please use GPT/Gemini or use the locally-served model at https://huggingface.co/TIGER-Lab/general-verifier. ## Notes - As discussed in [discussion_3](https://huggingface.co/datasets/TIGER-Lab/WebInstruct-verified/discussions/3#6912ee9987d86c668866171a), noticed by @zlk, some multiple choice questions have options missing from the question. We have fixed this in the latest dataset. The original version of the dataset can be found in the `train_legacy` split. ## Citation If you feel our work is helpful, please cite: ```bibtex @inproceedings{ ma2025generalreasoner, title={{G}eneral-{R}easoner: Advancing {LLM} Reasoning Across All Domains}, author={Xueguang Ma and Qian Liu and Dongfu Jiang and Ge Zhang and Zejun MA and Wenhu Chen}, booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}, year={2025}, url={https://openreview.net/forum?id=pBFVoll8Xa} } ```

# General-Reasoner:面向全领域的大语言模型推理能力进阶 <p align="center"> <a href="https://github.com/TIGER-AI-Lab/General-Reasoner" target="_blank">💻 代码</a> | <a href="https://arxiv.org/abs/2505.14652" target="_blank">📄 论文</a> | <a href="https://huggingface.co/datasets/TIGER-Lab/WebInstruct-verified" target="_blank">📊 数据集</a> | <a href="https://huggingface.co/collections/TIGER-Lab/general-reasoner-67fe9386e43e046489eac013" target="_blank">🤗 模型</a> | <a href="https://tiger-ai-lab.github.io/General-Reasoner/" target="_blank">🌐 项目主页</a> </p> ## 概述 <p align="center"> <img src="https://tiger-ai-lab.github.io/General-Reasoner/static/images/teaser.png" alt="General-Reasoner 示意" width="650"/> </p> <p align="center" style="font-style: italic; font-size: 0.95rem;"> <em> 图:相较于基线方法,使用基于模型的验证器训练得到的General-Reasoner在各类推理任务上的表现效果。 </em> </p> **General-Reasoner** 是面向大语言模型(Large Language Model, LLM)的训练范式,旨在全面增强其跨多样领域的推理能力——不仅涵盖数学与代码领域,还包括物理、化学、金融、人文等诸多学科。 **核心特性:** - **零强化学习训练(Zero RL Training)**:直接基于基础大语言模型开展强化学习,跳过中间的监督学习阶段。 - **多样化推理数据**:包含23万余条高质量、可验证的问题,从网络中采集并经过跨学科答案可验证性筛选。 - **基于模型的验证器**:采用轻量化的15亿参数生成式验证器模型,实现上下文感知的链式思考(Chain-of-Thought, CoT)答案验证,性能优于传统基于规则的方法。 本仓库包含多样化推理数据集WebInstruct-verified。 ## 数据集详情 我们构建了一个多样化的高质量数据集,以助力大语言模型在广泛领域中实现稳健的推理能力,而非局限于当前研究较多的数学问题范畴。 - **我们回溯WebInstruct的原始网页,重新爬取问题-答案对**:若原始页面未包含人工撰写的答案,则剔除该条目。此举确保每一条重新爬取的内容均经过人工验证,因此所有答案均具备可靠质量。 - **采用Gemini-1.5-Pro(Gemini-1.5-Pro)选择性提取具备可明确验证的简短答案的问题**,进一步提升数据集的可靠性。 - **随后使用Gemini-2.0-Flash(Gemini-2.0-Flash)为每个问题生成八组候选答案,以进行额外筛选**: - 剔除**所有八组Gemini生成答案均错误**的问题,消除网络爬取过程中产生的歧义或噪声样本。 - 同时移除**过于简单的问题**——即**所有八组候选答案均正确**的问题,以保留数据集的复杂度,更好地挑战模型的泛化能力。 上述步骤确保了所构建数据集的正确性。 ## 学科分布 各学科的数据分布如下图所示: <img src="https://cdn-uploads.huggingface.co/production/uploads/6313a86154e6e5d9f0f94e04/I_TplgIibmBM_A_nwZh7B.png" width="600"/> ## 答案验证 简短答案具备多种形式,包括浮点数、数组、矩阵、LaTeX格式等。如需验证这些答案,可使用GPT/Gemini,或通过https://huggingface.co/TIGER-Lab/general-verifier部署的本地模型进行验证。 ## 注意事项 - 正如[讨论区#3](https://huggingface.co/datasets/TIGER-Lab/WebInstruct-verified/discussions/3#6912ee9987d86c668866171a)中所讨论的,由用户@zlk发现,部分选择题存在题干缺失选项的问题。我们已在最新版数据集中修复了该问题,数据集的原始版本可在`train_legacy`划分中获取。 ## 引用 若您认为本工作对您有所帮助,请引用: bibtex @inproceedings{ ma2025generalreasoner, title={{G}eneral-{R}easoner: Advancing {LLM} Reasoning Across All Domains}, author={Xueguang Ma and Qian Liu and Dongfu Jiang and Ge Zhang and Zejun MA and Wenhu Chen}, booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}, year={2025}, url={https://openreview.net/forum?id=pBFVoll8Xa} }
提供机构:
maas
创建时间:
2025-04-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作