WebInstruct-verified
收藏魔搭社区2026-01-09 更新2025-04-26 收录
下载链接:
https://modelscope.cn/datasets/TIGER-Lab/WebInstruct-verified
下载链接
链接失效反馈官方服务:
资源简介:
# General-Reasoner: Advancing LLM Reasoning Across All Domains
<p align="center">
<a href="https://github.com/TIGER-AI-Lab/General-Reasoner" target="_blank">💻 Code</a> |
<a href="https://arxiv.org/abs/2505.14652" target="_blank">📄 Paper</a> |
<a href="https://huggingface.co/datasets/TIGER-Lab/WebInstruct-verified" target="_blank">📊 Dataset</a> |
<a href="https://huggingface.co/collections/TIGER-Lab/general-reasoner-67fe9386e43e046489eac013" target="_blank">🤗 Model</a> |
<a href="https://tiger-ai-lab.github.io/General-Reasoner/" target="_blank">🌐 Project Page</a>
</p>
## Overview
<p align="center">
<img src="https://tiger-ai-lab.github.io/General-Reasoner/static/images/teaser.png" alt="General-Reasoner Teaser" width="650"/>
</p>
<p align="center" style="font-style: italic; font-size: 0.95rem;">
<em>
Figure: Effectiveness of <strong>General-Reasoner</strong> trained with diverse verifiable reasoning questions using model-based verifier compared to baseline methods on various reasoning tasks.
</em>
</p>
**General-Reasoner** is a training paradigm for large language models (LLMs), designed to robustly enhance reasoning abilities across diverse domains—not just mathematics and coding, but also physics, chemistry, finance, humanities, and more.
**Key features:**
- **Zero RL Training:** Direct reinforcement learning from base LLMs, bypassing intermediate supervised stages.
- **Diverse Reasoning Data:** 230K+ high-quality, verifiable questions sourced from the web and filtered for answer verifiability across disciplines.
- **Model-Based Verifier:** Compact 1.5B generative verifier model for context-aware, chain-of-thought answer validation, outperforming traditional rule-based methods.
**This repo contains the Diverse Reasoning Data WebInstruct-verified **
## Dataset Details
We construct a diverse, high‑quality dataset to facilitate robust reasoning capabilities across a broad range of domains, extending beyond the commonly studied mathematical problems.
- **We trace back the data in WebInstruct to its original web page to re‑crawl the question–answer pairs.**
If the original page lacks human‑written answers, we drop the entry. This ensures every re‑crawled item is human‑verified and, therefore, that each answer is of reliable quality.
- **Gemini‑1.5‑Pro is employed to selectively extract questions with clearly verifiable short answers,** further boosting dataset reliability.
- **Gemini‑2.0‑Flash then generates eight candidate answers per question for additional filtering:**
- We discard any question for which **all eight Gemini‑generated answers are incorrect**, eliminating ambiguous or noisy items that arose during web scraping.
- We also remove **overly simple questions**—those for which **all eight candidate answers are correct**—to preserve dataset complexity and better challenge model generalization.
These steps ensure the correctness of the constructed dataset.
## Distribution
The distribution of disciplines is depicted as follows:
<img src="https://cdn-uploads.huggingface.co/production/uploads/6313a86154e6e5d9f0f94e04/I_TplgIibmBM_A_nwZh7B.png" width="600"/>
## Verification
The short answers have different forms, including float, array, matrix, latex, etc. To verify these answers, please use GPT/Gemini or use the locally-served model at https://huggingface.co/TIGER-Lab/general-verifier.
## Notes
- As discussed in [discussion_3](https://huggingface.co/datasets/TIGER-Lab/WebInstruct-verified/discussions/3#6912ee9987d86c668866171a), noticed by @zlk, some multiple choice questions have options missing from the question. We have fixed this in the latest dataset. The original version of the dataset can be found in the `train_legacy` split.
## Citation
If you feel our work is helpful, please cite:
```bibtex
@inproceedings{
ma2025generalreasoner,
title={{G}eneral-{R}easoner: Advancing {LLM} Reasoning Across All Domains},
author={Xueguang Ma and Qian Liu and Dongfu Jiang and Ge Zhang and Zejun MA and Wenhu Chen},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=pBFVoll8Xa}
}
```
# General-Reasoner:面向全领域的大语言模型推理能力进阶
<p align="center">
<a href="https://github.com/TIGER-AI-Lab/General-Reasoner" target="_blank">💻 代码</a> |
<a href="https://arxiv.org/abs/2505.14652" target="_blank">📄 论文</a> |
<a href="https://huggingface.co/datasets/TIGER-Lab/WebInstruct-verified" target="_blank">📊 数据集</a> |
<a href="https://huggingface.co/collections/TIGER-Lab/general-reasoner-67fe9386e43e046489eac013" target="_blank">🤗 模型</a> |
<a href="https://tiger-ai-lab.github.io/General-Reasoner/" target="_blank">🌐 项目主页</a>
</p>
## 概述
<p align="center">
<img src="https://tiger-ai-lab.github.io/General-Reasoner/static/images/teaser.png" alt="General-Reasoner 示意" width="650"/>
</p>
<p align="center" style="font-style: italic; font-size: 0.95rem;">
<em>
图:相较于基线方法,使用基于模型的验证器训练得到的General-Reasoner在各类推理任务上的表现效果。
</em>
</p>
**General-Reasoner** 是面向大语言模型(Large Language Model, LLM)的训练范式,旨在全面增强其跨多样领域的推理能力——不仅涵盖数学与代码领域,还包括物理、化学、金融、人文等诸多学科。
**核心特性:**
- **零强化学习训练(Zero RL Training)**:直接基于基础大语言模型开展强化学习,跳过中间的监督学习阶段。
- **多样化推理数据**:包含23万余条高质量、可验证的问题,从网络中采集并经过跨学科答案可验证性筛选。
- **基于模型的验证器**:采用轻量化的15亿参数生成式验证器模型,实现上下文感知的链式思考(Chain-of-Thought, CoT)答案验证,性能优于传统基于规则的方法。
本仓库包含多样化推理数据集WebInstruct-verified。
## 数据集详情
我们构建了一个多样化的高质量数据集,以助力大语言模型在广泛领域中实现稳健的推理能力,而非局限于当前研究较多的数学问题范畴。
- **我们回溯WebInstruct的原始网页,重新爬取问题-答案对**:若原始页面未包含人工撰写的答案,则剔除该条目。此举确保每一条重新爬取的内容均经过人工验证,因此所有答案均具备可靠质量。
- **采用Gemini-1.5-Pro(Gemini-1.5-Pro)选择性提取具备可明确验证的简短答案的问题**,进一步提升数据集的可靠性。
- **随后使用Gemini-2.0-Flash(Gemini-2.0-Flash)为每个问题生成八组候选答案,以进行额外筛选**:
- 剔除**所有八组Gemini生成答案均错误**的问题,消除网络爬取过程中产生的歧义或噪声样本。
- 同时移除**过于简单的问题**——即**所有八组候选答案均正确**的问题,以保留数据集的复杂度,更好地挑战模型的泛化能力。
上述步骤确保了所构建数据集的正确性。
## 学科分布
各学科的数据分布如下图所示:
<img src="https://cdn-uploads.huggingface.co/production/uploads/6313a86154e6e5d9f0f94e04/I_TplgIibmBM_A_nwZh7B.png" width="600"/>
## 答案验证
简短答案具备多种形式,包括浮点数、数组、矩阵、LaTeX格式等。如需验证这些答案,可使用GPT/Gemini,或通过https://huggingface.co/TIGER-Lab/general-verifier部署的本地模型进行验证。
## 注意事项
- 正如[讨论区#3](https://huggingface.co/datasets/TIGER-Lab/WebInstruct-verified/discussions/3#6912ee9987d86c668866171a)中所讨论的,由用户@zlk发现,部分选择题存在题干缺失选项的问题。我们已在最新版数据集中修复了该问题,数据集的原始版本可在`train_legacy`划分中获取。
## 引用
若您认为本工作对您有所帮助,请引用:
bibtex
@inproceedings{
ma2025generalreasoner,
title={{G}eneral-{R}easoner: Advancing {LLM} Reasoning Across All Domains},
author={Xueguang Ma and Qian Liu and Dongfu Jiang and Ge Zhang and Zejun MA and Wenhu Chen},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=pBFVoll8Xa}
}
提供机构:
maas
创建时间:
2025-04-22



