five

WebInstruct-verified-unfiltered

收藏
魔搭社区2025-12-05 更新2025-06-28 收录
下载链接:
https://modelscope.cn/datasets/TIGER-Lab/WebInstruct-verified-unfiltered
下载链接
链接失效反馈
官方服务:
资源简介:
**This repo contains the unfiltered version [WebInstruct-verified](https://huggingface.co/datasets/TIGER-Lab/WebInstruct-verified) in the General Reasoner work.** ## General-Reasoner: Advancing LLM Reasoning Across All Domains <p align="center"> <a href="https://github.com/TIGER-AI-Lab/General-Reasoner" target="_blank">💻 Code</a> | <a href="https://arxiv.org/abs/2505.14652" target="_blank">📄 Paper</a> | <a href="https://huggingface.co/datasets/TIGER-Lab/WebInstruct-verified" target="_blank">📊 Dataset</a> | <a href="https://huggingface.co/collections/TIGER-Lab/general-reasoner-67fe9386e43e046489eac013" target="_blank">🤗 Model</a> | <a href="https://tiger-ai-lab.github.io/General-Reasoner/" target="_blank">🌐 Project Page</a> </p> ## Overview <p align="center"> <img src="https://tiger-ai-lab.github.io/General-Reasoner/static/images/teaser.png" alt="General-Reasoner Teaser" width="650"/> </p> <p align="center" style="font-style: italic; font-size: 0.95rem;"> <em> Figure: Effectiveness of <strong>General-Reasoner</strong> trained with diverse verifiable reasoning questions using model-based verifier compared to baseline methods on various reasoning tasks. </em> </p> **General-Reasoner** is a training paradigm for large language models (LLMs), designed to robustly enhance reasoning abilities across diverse domains—not just mathematics and coding, but also physics, chemistry, finance, humanities, and more. **Key features:** - **Zero RL Training:** Direct reinforcement learning from base LLMs, bypassing intermediate supervised stages. - **Diverse Reasoning Data:** 230K+ high-quality, verifiable questions sourced from the web and filtered for answer verifiability across disciplines. - **Model-Based Verifier:** Compact 1.5B generative verifier model for context-aware, chain-of-thought answer validation, outperforming traditional rule-based methods. ## Dataset Details We construct a diverse, high‑quality dataset to facilitate robust reasoning capabilities across a broad range of domains, extending beyond the commonly studied mathematical problems. - **We trace back the data in WebInstruct to its original web page to re‑crawl the question–answer pairs.** If the original page lacks human‑written answers, we drop the entry. This ensures every re‑crawled item is human‑verified and, therefore, that each answer is of reliable quality. - **Gemini‑1.5‑Pro is employed to selectively extract questions with clearly verifiable short answers,** further boosting dataset reliability. - **Gemini‑2.0‑Flash then generates eight candidate answers per question for additional filtering:** - We discard any question for which **all eight Gemini‑generated answers are incorrect**, eliminating ambiguous or noisy items that arose during web scraping. - We also remove **overly simple questions**—those for which **all eight candidate answers are correct**—to preserve dataset complexity and better challenge model generalization. These steps ensure the correctness of the constructed dataset. ## Distribution The distribution of disciplines is depicted as follows: <img src="https://cdn-uploads.huggingface.co/production/uploads/6313a86154e6e5d9f0f94e04/I_TplgIibmBM_A_nwZh7B.png" width="600"/> ## Verification The short answers have different forms, including float, array, matrix, latex, etc. To verifify these answers, please use GPT/Gemini or use the locally-served model at https://huggingface.co/TIGER-Lab/general-verifier. ## Citation If you feel our work is helpful, please cite: ```bibtex @article{general-reasoner, title={{G}eneral-{R}easoner: Advancing LLM Reasoning Across All Domains}, author={Xueguang Ma and Qian Liu and Dongfu Jiang and Ge Zhang and Zejun Ma and Wenhu Chen}, year={2025}, journal={arXiv:2505.14652}, url={https://arxiv.org/abs/2505.14652} } ```

**本仓库包含了《General Reasoner》研究工作中未经过滤的[WebInstruct-verified](https://huggingface.co/datasets/TIGER-Lab/WebInstruct-verified)版本。** ## General-Reasoner:推进跨领域大语言模型推理能力 <p align="center"> <a href="https://github.com/TIGER-AI-Lab/General-Reasoner" target="_blank">💻 代码</a> | <a href="https://arxiv.org/abs/2505.14652" target="_blank">📄 论文</a> | <a href="https://huggingface.co/datasets/TIGER-Lab/WebInstruct-verified" target="_blank">📊 数据集</a> | <a href="https://huggingface.co/collections/TIGER-Lab/general-reasoner-67fe9386e43e046489eac013" target="_blank">🤗 模型</a> | <a href="https://tiger-ai-lab.github.io/General-Reasoner/" target="_blank">🌐 项目主页</a> </p> ## 概述 <p align="center"> <img src="https://tiger-ai-lab.github.io/General-Reasoner/static/images/teaser.png" alt="General-Reasoner 示意图" width="650"/> </p> <p align="center" style="font-style: italic; font-size: 0.95rem;"> <em> 图:相较于基线方法,使用基于模型的验证器结合多样化可验证推理问题训练得到的General-Reasoner在各类推理任务上的表现效果。 </em> </p> **General-Reasoner** 是面向大语言模型(Large Language Model, LLM)的训练范式,旨在全面强化模型在多领域的推理能力——不仅局限于数学与编码领域,还覆盖物理、化学、金融、人文社科等诸多学科。 **核心特性:** - **零强化学习训练(Zero RL Training):** 直接基于基础大语言模型开展强化学习,无需中间监督学习阶段。 - **多样化推理数据:** 包含23万余条高质量、可验证的问题,均源自网络,并经过多学科答案可验证性筛选。 - **基于模型的验证器(Model-Based Verifier):** 采用轻量化的15亿参数生成式验证模型,实现上下文感知的链式思考答案验证,性能优于传统基于规则的方法。 ## 数据集详情 我们构建了多样化的高质量数据集,以助力模型在广泛领域中实现稳健的推理能力,突破了以往研究仅聚焦数学问题的局限。 - **我们回溯WebInstruct数据集的原始网页,重新爬取问题-答案对。** 若原始页面未包含人工撰写的答案,则剔除该条目。此举确保每一条重新爬取的样本均经过人工验证,因此所有答案均具备可靠质量。 - **使用Gemini-1.5-Pro选择性提取具备明确可验证短答案的问题,** 进一步提升数据集的可靠性。 - **随后由Gemini-2.0-Flash为每个问题生成8条候选答案,用于额外筛选:** - 剔除所有8条Gemini生成答案均错误的问题,消除网络爬取过程中产生的歧义或噪声样本。 - 同时移除**过于简单的问题**——即8条候选答案全部正确的问题,以保留数据集的复杂度,更好地挑战模型的泛化能力。 上述步骤确保了所构建数据集的正确性。 ## 数据分布 各学科的数据分布如下所示: <img src="https://cdn-uploads.huggingface.co/production/uploads/6313a86154e6e5d9f0f94e04/I_TplgIibmBM_A_nwZh7B.png" width="600"/> ## 验证说明 短答案存在多种形式,包括浮点数、数组、矩阵、LaTeX公式等。如需验证此类答案,可使用GPT、Gemini模型,或通过https://huggingface.co/TIGER-Lab/general-verifier 部署的本地服务模型进行验证。 ## 引用 若您认为本工作对您有所帮助,请引用如下文献: bibtex @article{general-reasoner, title={{G}eneral-{R}easoner: Advancing LLM Reasoning Across All Domains}, author={Xueguang Ma and Qian Liu and Dongfu Jiang and Ge Zhang and Zejun Ma and Wenhu Chen}, year={2025}, journal={arXiv:2505.14652}, url={https://arxiv.org/abs/2505.14652} }
提供机构:
maas
创建时间:
2025-06-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作