WebInstruct-verified

Name: WebInstruct-verified
Creator: maas
Published: 2026-01-09 20:53:03
License: 暂无描述

魔搭社区2026-01-09 更新2025-04-26 收录

下载链接：

https://modelscope.cn/datasets/TIGER-Lab/WebInstruct-verified

下载链接

链接失效反馈

官方服务：

资源简介：

# General-Reasoner: Advancing LLM Reasoning Across All Domains <a href="https://github.com/TIGER-AI-Lab/General-Reasoner" target="_blank">💻 Code</a> | <a href="https://arxiv.org/abs/2505.14652" target="_blank">📄 Paper</a> | <a href="https://huggingface.co/datasets/TIGER-Lab/WebInstruct-verified" target="_blank">📊 Dataset</a> | <a href="https://huggingface.co/collections/TIGER-Lab/general-reasoner-67fe9386e43e046489eac013" target="_blank">🤗 Model</a> | <a href="https://tiger-ai-lab.github.io/General-Reasoner/" target="_blank">🌐 Project Page</a> ## Overview <img src="https://tiger-ai-lab.github.io/General-Reasoner/static/images/teaser.png" alt="General-Reasoner Teaser" width="650"/> Figure: Effectiveness of General-Reasoner trained with diverse verifiable reasoning questions using model-based verifier compared to baseline methods on various reasoning tasks. **General-Reasoner** is a training paradigm for large language models (LLMs), designed to robustly enhance reasoning abilities across diverse domains—not just mathematics and coding, but also physics, chemistry, finance, humanities, and more. **Key features:** - **Zero RL Training:** Direct reinforcement learning from base LLMs, bypassing intermediate supervised stages. - **Diverse Reasoning Data:** 230K+ high-quality, verifiable questions sourced from the web and filtered for answer verifiability across disciplines. - **Model-Based Verifier:** Compact 1.5B generative verifier model for context-aware, chain-of-thought answer validation, outperforming traditional rule-based methods. **This repo contains the Diverse Reasoning Data WebInstruct-verified ** ## Dataset Details We construct a diverse, high‑quality dataset to facilitate robust reasoning capabilities across a broad range of domains, extending beyond the commonly studied mathematical problems. - **We trace back the data in WebInstruct to its original web page to re‑crawl the question–answer pairs.** If the original page lacks human‑written answers, we drop the entry. This ensures every re‑crawled item is human‑verified and, therefore, that each answer is of reliable quality. - **Gemini‑1.5‑Pro is employed to selectively extract questions with clearly verifiable short answers,** further boosting dataset reliability. - **Gemini‑2.0‑Flash then generates eight candidate answers per question for additional filtering:** - We discard any question for which **all eight Gemini‑generated answers are incorrect**, eliminating ambiguous or noisy items that arose during web scraping. - We also remove **overly simple questions**—those for which **all eight candidate answers are correct**—to preserve dataset complexity and better challenge model generalization. These steps ensure the correctness of the constructed dataset. ## Distribution The distribution of disciplines is depicted as follows: <img src="https://cdn-uploads.huggingface.co/production/uploads/6313a86154e6e5d9f0f94e04/I_TplgIibmBM_A_nwZh7B.png" width="600"/> ## Verification The short answers have different forms, including float, array, matrix, latex, etc. To verify these answers, please use GPT/Gemini or use the locally-served model at https://huggingface.co/TIGER-Lab/general-verifier. ## Notes - As discussed in [discussion_3](https://huggingface.co/datasets/TIGER-Lab/WebInstruct-verified/discussions/3#6912ee9987d86c668866171a), noticed by @zlk, some multiple choice questions have options missing from the question. We have fixed this in the latest dataset. The original version of the dataset can be found in the `train_legacy` split. ## Citation If you feel our work is helpful, please cite: ```bibtex @inproceedings{ ma2025generalreasoner, title={{G}eneral-{R}easoner: Advancing {LLM} Reasoning Across All Domains}, author={Xueguang Ma and Qian Liu and Dongfu Jiang and Ge Zhang and Zejun MA and Wenhu Chen}, booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}, year={2025}, url={https://openreview.net/forum?id=pBFVoll8Xa} } ```

# General-Reasoner：面向全领域的大语言模型推理能力进阶 <a href="https://github.com/TIGER-AI-Lab/General-Reasoner" target="_blank">💻 代码</a> | <a href="https://arxiv.org/abs/2505.14652" target="_blank">📄 论文</a> | <a href="https://huggingface.co/datasets/TIGER-Lab/WebInstruct-verified" target="_blank">📊 数据集</a> | <a href="https://huggingface.co/collections/TIGER-Lab/general-reasoner-67fe9386e43e046489eac013" target="_blank">🤗 模型</a> | <a href="https://tiger-ai-lab.github.io/General-Reasoner/" target="_blank">🌐 项目主页</a> ## 概述 <img src="https://tiger-ai-lab.github.io/General-Reasoner/static/images/teaser.png" alt="General-Reasoner 示意" width="650"/> 图：相较于基线方法，使用基于模型的验证器训练得到的General-Reasoner在各类推理任务上的表现效果。 **General-Reasoner** 是面向大语言模型（Large Language Model, LLM）的训练范式，旨在全面增强其跨多样领域的推理能力——不仅涵盖数学与代码领域，还包括物理、化学、金融、人文等诸多学科。 **核心特性：** - **零强化学习训练（Zero RL Training）**：直接基于基础大语言模型开展强化学习，跳过中间的监督学习阶段。 - **多样化推理数据**：包含23万余条高质量、可验证的问题，从网络中采集并经过跨学科答案可验证性筛选。 - **基于模型的验证器**：采用轻量化的15亿参数生成式验证器模型，实现上下文感知的链式思考（Chain-of-Thought, CoT）答案验证，性能优于传统基于规则的方法。本仓库包含多样化推理数据集WebInstruct-verified。 ## 数据集详情我们构建了一个多样化的高质量数据集，以助力大语言模型在广泛领域中实现稳健的推理能力，而非局限于当前研究较多的数学问题范畴。 - **我们回溯WebInstruct的原始网页，重新爬取问题-答案对**：若原始页面未包含人工撰写的答案，则剔除该条目。此举确保每一条重新爬取的内容均经过人工验证，因此所有答案均具备可靠质量。 - **采用Gemini-1.5-Pro（Gemini-1.5-Pro）选择性提取具备可明确验证的简短答案的问题**，进一步提升数据集的可靠性。 - **随后使用Gemini-2.0-Flash（Gemini-2.0-Flash）为每个问题生成八组候选答案，以进行额外筛选**： - 剔除**所有八组Gemini生成答案均错误**的问题，消除网络爬取过程中产生的歧义或噪声样本。 - 同时移除**过于简单的问题**——即**所有八组候选答案均正确**的问题，以保留数据集的复杂度，更好地挑战模型的泛化能力。上述步骤确保了所构建数据集的正确性。 ## 学科分布各学科的数据分布如下图所示： <img src="https://cdn-uploads.huggingface.co/production/uploads/6313a86154e6e5d9f0f94e04/I_TplgIibmBM_A_nwZh7B.png" width="600"/> ## 答案验证简短答案具备多种形式，包括浮点数、数组、矩阵、LaTeX格式等。如需验证这些答案，可使用GPT/Gemini，或通过https://huggingface.co/TIGER-Lab/general-verifier部署的本地模型进行验证。 ## 注意事项 - 正如[讨论区#3](https://huggingface.co/datasets/TIGER-Lab/WebInstruct-verified/discussions/3#6912ee9987d86c668866171a)中所讨论的，由用户@zlk发现，部分选择题存在题干缺失选项的问题。我们已在最新版数据集中修复了该问题，数据集的原始版本可在`train_legacy`划分中获取。 ## 引用若您认为本工作对您有所帮助，请引用： bibtex @inproceedings{ ma2025generalreasoner, title={{G}eneral-{R}easoner: Advancing {LLM} Reasoning Across All Domains}, author={Xueguang Ma and Qian Liu and Dongfu Jiang and Ge Zhang and Zejun MA and Wenhu Chen}, booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}, year={2025}, url={https://openreview.net/forum?id=pBFVoll8Xa} }

提供机构：

maas

创建时间：

2025-04-22

5,000+

优质数据集

54 个

任务类型

进入经典数据集