HalluDial
收藏arXiv2024-06-11 更新2024-06-21 收录
下载链接:
https://github.com/FlagOpen/HalluDial
下载链接
链接失效反馈官方服务:
资源简介:
HalluDial是由北京人工智能研究院创建的大规模对话级幻觉评估基准。该数据集包含4,094个对话,总计146,856个样本,涵盖自发和诱导的幻觉场景,涉及事实性和忠实性幻觉。数据集的创建过程包括多样化的对话采样和自动幻觉标注,旨在评估大型语言模型在信息寻求对话中的幻觉评估能力。HalluDial的应用领域主要集中在自动评估对话级幻觉,解决大型语言模型生成内容中的不准确或误导性信息问题。
HalluDial is a large-scale conversational hallucination evaluation benchmark developed by the Beijing Academy of Artificial Intelligence. This dataset includes 4,094 dialogues totaling 146,856 samples, covering both spontaneous and induced hallucination scenarios involving factual and faithfulness hallucinations. The dataset construction involves diversified dialogue sampling and automatic hallucination annotation, aiming to evaluate the hallucination detection capability of large language models (LLMs) in information-seeking conversations. The main application of HalluDial focuses on automatic conversational-level hallucination evaluation, addressing the issue of inaccurate or misleading content generated by LLMs.
提供机构:
北京人工智能研究院
创建时间:
2024-06-11



