lrsbrgrn/HalluGuard-Preferences-76k

Name: lrsbrgrn/HalluGuard-Preferences-76k
Creator: lrsbrgrn
Published: 2026-04-11 16:46:50
License: 暂无描述

Hugging Face2026-04-11 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/lrsbrgrn/HalluGuard-Preferences-76k

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - text-classification language: - en tags: - reasoning - preferences - hallucination-detection - data-reformation - synthetic - orpo - rag size_categories: - 10K<n<100K --- <p align="center"> <img src="https://github.com/lrsbrgrn/blogging-resources/blob/main/HalluGuard/halluguard-prefs.png?raw=true" alt="HalluClaim" width="700"/> </p> <div align="center"> <h1>🛡️ HalluClaim-Prefs: A 76K Synthetic Preference Dataset for Document-Grounded Hallucination Detection</h1> </div> ## 🌍 Overview This dataset was used to fine-tune [HalluGuard-Qwen3-4B](https://huggingface.co/lrsbrgrn/HalluGuard-Qwen3-4B) via Odds Ratio Preference Optimization (ORPO). It consists of 76,708 high-quality preference tuples designed to teach the model how to reason and justify its hallucination detection. ## 📖 Publication This dataset was introduced in our paper at the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026). > *HalluGuard: Evidence-Grounded Small Reasoning Models to Mitigate Hallucinations in Retrieval-Augmented Generation* ## 🏗️ Dataset Construction (Chosen vs. Rejected) The dataset uses a "Size-Based Heuristic" to create pairs. Every example contains a *Prompt* (the task), a *Chosen* response (the good behavior), and a *Rejected* response (the bad behavior). ### 1. The David vs. Goliath Setup * **Chosen Response:** Generated by ([Qwen/Qwen3-235B-A22B](https://huggingface.co/Qwen/Qwen3-235B-A22B)). These responses represent high-quality reasoning and correct labels. * **Rejected Response:** Generated by ([Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B)). These responses represent weaker reasoning, lack of evidence, or incorrect labels. ### 2. Multi-Stage Filtering To ensure the *Chosen* response is actually superior, we applied: * **Label Verification:** We discarded the pair if the Qwen3-235B-A22B's classification didn't match the original ground-truth label. * **Consensus Filtering:** Two independent "Judge" models ([openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) and [DeepSeek-V3.1](https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus)) reviewed the pairs. Only pairs where both judges preferred the Qwen3-235B-A22B's answer were kept. ## 📦 Dataset Structure | Field | Content | | :--- | :--- | | **Prompt** | Instructions + Document + Claim | | **Chosen** | `<think>Detailed reasoning</think>` + `<answer><classification>Label</classification><justification>Evidence</justification></answer>` | | **Rejected** | `<think>Flawed/Short reasoning</think>` + `<answer><classification>Label</classification><justification>Evidence</justification></answer>` | ## ⚠️ Limitations - Synthetic data: may not capture all real-world hallucination patterns - English-only - Document-grounded: labels reflect support relative to the document, not real-world truth - Potential biases inherited from web data (i.e., FineWeb) and style rewriting ## ⚖️ Ethical Considerations Models trained on HalluClaim should be used as decision-support systems, not fully autonomous systems. Errors may include: - false positives (flagging correct claims) - false negatives (missing hallucinations) Human oversight is recommended, especially in sensitive domains. ## 📚 Citation ```bibtex @article{bergeron2025halluguard, title={HalluGuard: Evidence-Grounded Small Reasoning Models to Mitigate Hallucinations in Retrieval-Augmented Generation}, author={Bergeron, Loris and Buhnila, Ioana and François, Jérôme and State, Radu}, journal={arXiv preprint arXiv:2510.00880}, year={2025} } ```

提供机构：

lrsbrgrn

搜集汇总

数据集介绍

构建方式

在文档检索增强生成领域，幻觉检测是确保信息可靠性的关键挑战。HalluGuard-Preferences-76k数据集通过规模启发式方法构建，以高质量偏好元组形式呈现。每个示例包含一个提示、一个被选中的响应以及一个被拒绝的响应，其中被选中响应由强大的Qwen3-235B-A22B模型生成，代表正确的分类与详尽的推理；而被拒绝响应则源自能力较弱的Qwen3-0.6B模型，呈现推理缺陷或错误标签。为确保数据质量，构建过程实施了多阶段过滤机制，包括标签验证以对齐真实标签，并引入两个独立评判模型进行共识筛选，仅保留双方均偏好强大模型响应的配对，从而形成七万六千余条精炼的合成偏好数据。

特点

该数据集专为文档接地幻觉检测任务设计，其核心特征在于结构化响应格式与高质量推理内容。每个响应均遵循特定模板，包含思维链推理部分与答案部分，答案中进一步细分分类标签与证据论证，这种设计有助于模型学习系统化推理过程。数据集全部为英文合成数据，规模达七万六千余条，覆盖多样化的文档与声明组合。尽管数据源于合成，但通过严格的多模型评判与验证流程，确保了被选中响应在逻辑严谨性、证据充分性与分类准确性方面的优越性，为模型优化提供了清晰的偏好信号。

使用方法

该数据集主要用于通过几率比偏好优化等算法对语言模型进行微调，以提升其在文档接地幻觉检测任务中的推理与判断能力。使用者可直接加载数据集，提取提示、被选中响应与被拒绝响应三元组作为训练样本。在模型训练过程中，应注重引导模型学习被选中响应中体现的细致推理结构与证据引用方式，同时避免被拒绝响应中存在的推理缺陷。鉴于数据集的合成性质与潜在偏见，建议将训练后的模型作为决策支持工具，在敏感领域应用中结合人工审核，以降低误判风险，并注意其仅针对文档支持性进行判断，而非评估真实世界事实。

背景与挑战

背景概述

HalluGuard-Preferences-76k数据集由Loris Bergeron等研究人员于2025年构建，旨在应对检索增强生成系统中普遍存在的幻觉问题。该数据集在第六十四届计算语言学协会年会上正式发布，其核心研究聚焦于通过证据驱动的推理模型来检测和缓解文档基础幻觉。通过引入规模达七万六千余条的高质量偏好元组，该数据集为训练小型推理模型提供了结构化监督信号，显著推动了幻觉检测领域从单纯分类向可解释性推理的范式转变。

当前挑战

该数据集致力于解决文档基础幻觉检测这一核心挑战，其难点在于模型需精准区分生成内容与源文档之间的一致性，而非判断绝对事实性。在构建过程中，研究者面临合成数据固有局限性的挑战，包括可能无法完全覆盖现实世界中的复杂幻觉模式，以及依赖大型语言模型生成响应所引入的潜在风格偏见。此外，为确保偏好对的质量，需实施多阶段过滤机制，如标签验证与双法官共识筛选，这增加了数据清洗的复杂性与计算成本。

常用场景

经典使用场景

在检索增强生成（RAG）系统中，模型常因信息整合不当而产生事实性幻觉，HalluGuard-Preferences-76k数据集为此提供了精准的训练资源。该数据集通过精心构建的偏好对，引导模型学习如何基于给定文档进行推理，从而区分真实陈述与虚假生成。其典型应用场景包括微调小型语言模型，使其具备证据驱动的幻觉检测能力，优化RAG流程的可靠性。

衍生相关工作

该数据集催生了HalluGuard-Qwen3-4B等经典工作，展示了利用Odds Ratio Preference Optimization（ORPO）技术微调小型模型的成功路径。相关研究进一步探索了合成偏好数据在模型对齐、鲁棒性增强方面的潜力，为构建更高效、更可靠的幻觉检测与缓解系统提供了新的方法论基础。

数据集最近研究