FactCHD-幻觉检测Bench

Name: FactCHD-幻觉检测Bench
Creator: maas
Published: 2025-09-07 02:34:48
License: 暂无描述

魔搭社区2025-09-07 更新2024-05-15 收录

下载链接：

https://modelscope.cn/datasets/ZJUNLP/FactCHD

下载链接

链接失效反馈

官方服务：

资源简介：

尽管大型语言模型（LLMs）具有令人印象深刻的生成能力，但它们在现实世界应用中受到事实冲突幻觉的限制。尤其是在复杂的推理场景下，准确识别LLMs生成的内容中的幻觉是一个相对未被探索的领域。我们因此提出了一个专门用于检测LLMs中事实冲突幻觉的基准测试集FactCHD。FactCHD包含了一个多样化的数据集，涵盖了各种事实性模式，包括基础型、多跳型、比较型和集合操作型。FactCHD的一个独特之处在于它整合了基于事实的证据链，从而显著增强了对检测器解释深度的评估。对不同LLMs的实验揭示了当前方法在准确检测事实错误方面的不足之处。基准测试集的相关代码可在参见 ttps://github.com/zjunlp/FactHD。

Despite their impressive generative capabilities, Large Language Models (LLMs) face limitations in real-world applications due to factual conflict hallucinations. Particularly in complex reasoning scenarios, accurately detecting hallucinations in content produced by LLMs remains a relatively under-explored research direction. We thus introduce FactCHD, a benchmark dataset dedicated to detecting factual conflict hallucinations in LLMs. FactCHD comprises a diverse dataset encompassing multiple factual patterns, namely basic-type, multi-hop, comparative, and set-operation-based tasks. A unique strength of FactCHD is its integration of fact-based evidence chains, which greatly enhances the evaluation of the interpretive depth of hallucination detectors. Experiments conducted across various LLMs demonstrate the inadequacies of existing approaches for accurately detecting factual errors. Relevant code for this benchmark dataset is accessible at https://github.com/zjunlp/FactHD.

提供机构：

maas

创建时间：

2024-02-01

搜集汇总

数据集介绍