FactCHD-幻觉检测Bench
收藏魔搭社区2025-09-07 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/ZJUNLP/FactCHD
下载链接
链接失效反馈官方服务:
资源简介:
尽管大型语言模型(LLMs)具有令人印象深刻的生成能力,但它们在现实世界应用中受到事实冲突幻觉的限制。尤其是在复杂的推理场景下,准确识别LLMs生成的内容中的幻觉是一个相对未被探索的领域。我们因此提出了一个专门用于检测LLMs中事实冲突幻觉的基准测试集FactCHD。FactCHD包含了一个多样化的数据集,涵盖了各种事实性模式,包括基础型、多跳型、比较型和集合操作型。FactCHD的一个独特之处在于它整合了基于事实的证据链,从而显著增强了对检测器解释深度的评估。对不同LLMs的实验揭示了当前方法在准确检测事实错误方面的不足之处。 基准测试集的相关代码可在参见 ttps://github.com/zjunlp/FactHD。
Despite their impressive generative capabilities, Large Language Models (LLMs) face limitations in real-world applications due to factual conflict hallucinations. Particularly in complex reasoning scenarios, accurately detecting hallucinations in content produced by LLMs remains a relatively under-explored research direction. We thus introduce FactCHD, a benchmark dataset dedicated to detecting factual conflict hallucinations in LLMs. FactCHD comprises a diverse dataset encompassing multiple factual patterns, namely basic-type, multi-hop, comparative, and set-operation-based tasks. A unique strength of FactCHD is its integration of fact-based evidence chains, which greatly enhances the evaluation of the interpretive depth of hallucination detectors. Experiments conducted across various LLMs demonstrate the inadequacies of existing approaches for accurately detecting factual errors. Relevant code for this benchmark dataset is accessible at https://github.com/zjunlp/FactHD.
提供机构:
maas
创建时间:
2024-02-01
搜集汇总
数据集介绍

背景与挑战
背景概述
FactCHD是一个专注于检测大型语言模型中事实冲突幻觉的基准测试数据集,涵盖多种事实性模式,并集成了事实证据链以增强检测器解释的深度。数据集包含训练集和测试集,均为英文数据。
以上内容由遇见数据集搜集并总结生成



