HaDes
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/microsoft/hades
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为HaDes,是一个专为自由文本生成中的词级、无需参考的幻觉检测而设计的标注数据集。它由源自英文维基百科文章的扰动文本片段组成,并标注了识别幻觉的部分。在创建此数据集时,研究者们通过扰动维基百科原文,并采用众包标注的方式进行。为了在标注过程中缓解标签不平衡的问题,还采用了迭代模型循环策略。该数据集大约包含了100万个扰动文本片段,其任务是针对文本生成中的幻觉检测。
This dataset, named HaDes, is an annotated dataset specifically designed for word-level, reference-free hallucination detection in free-text generation. It comprises perturbed text segments sourced from English Wikipedia articles, with annotated spans marking the locations of identified hallucinations. During the dataset's development, researchers first perturbed the original Wikipedia content and then completed the annotation work via crowdsourcing. To mitigate label imbalance during the annotation process, an iterative model-in-the-loop strategy was also adopted. This dataset contains approximately 1 million perturbed text segments, and its core task is hallucination detection in text generation.
提供机构:
Internal crowd-sourcing platform



