cert-framework/human-confabulation-benchmark

Name: cert-framework/human-confabulation-benchmark
Creator: cert-framework
Published: 2026-04-22 07:52:22
License: 暂无描述

Hugging Face2026-04-22 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/cert-framework/human-confabulation-benchmark

下载链接

链接失效反馈

官方服务：

资源简介：

每个人工幻觉基准测试都通过提示LLM生成其虚假内容。本数据集采用相反的方法：人类根据记忆写出听起来合理但完全虚假的回应，而不查阅任何来源——产生神经心理学意义上的虚构（Berlyne，1972）。结果：在LLM生成的基准测试（HaluEval）上达到88-97%准确率的基于嵌入的检测方法，在这些人类虚构内容上降至69-78%。分布假说（Harris，1954）解释了原因——句子嵌入编码的是共现模式，而非指称性真理。保持在领域范围内的虚构对余弦相似度方法是不可见的。数据集包含212个问题-回答对，涵盖九个知识领域，每个对包括一个经过验证的基于事实的回答和一个人工编写的虚构回答。

Every major hallucination benchmark generates its false content by prompting an LLM. This dataset takes the opposite approach: a human writes plausible-sounding but entirely false responses from memory, without consulting any source — producing confabulations in the neuropsychological sense (Berlyne, 1972). The result: embedding-based detection methods that achieve 88–97% accuracy on LLM-generated benchmarks (HaluEval) drop to 69–78% on these human confabulations. The distributional hypothesis (Harris, 1954) explains why — sentence embeddings encode co-occurrence patterns, not referential truth. Confabulations that stay within the register of their domain are invisible to cosine-similarity methods. The dataset contains 212 question–response pairs across nine knowledge domains, each consisting of a verified grounded response and a human-written confabulation to the same question.

提供机构：

cert-framework

5,000+

优质数据集

54 个

任务类型

进入经典数据集