clam004/antihallucination_dataset
收藏Hugging Face2024-04-10 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/clam004/antihallucination_dataset
下载链接
链接失效反馈官方服务:
资源简介:
该数据集的结构如下:每个样本以特殊序列`<truth>`开始,后面跟随一个真实参考文档,该文档是最高准确度的文本,并以`<generated>`结束。接着是生成的文本,该文本可能包含幻觉,以`<generated>`开始,以`<eval>`结束。模型会学习重复生成的文本,并在每个段落的末尾标记其准确性,准确性分为三个级别:`<accurate>`、`<minor_inaccurate>`和`<major_inaccurate>`。最后,模型在完成标记后会发出停止序列`<stop>`。
The structure of this dataset is as follows: Each sample starts with the special sequence `<truth>`, followed by a gold reference document—the text of the highest accuracy—which ends with the sequence `<generated>`. Next comes the generated text, which may contain hallucinations, starting with `<generated>` and ending with `<eval>`. The model is trained to reproduce the generated text and annotate its accuracy at the end of each paragraph, with the accuracy categorized into three levels: `<accurate>`, `<minor_inaccurate>`, and `<major_inaccurate>`. Finally, the model will output the stop sequence `<stop>` once the annotation is completed.
提供机构:
clam004
原始信息汇总
数据集概述
数据结构
- 每个样本以特殊序列
<truth>开始,随后是作为最高准确度参考文档的文本,以<generated>结束。 - 生成文本紧随
<generated>标签之后,可能包含幻觉内容,并以<eval>结束。 - 模型将学习重复生成文本,并在每个段落末尾添加标签,表示该段落的准确性级别。
准确性标签
<accurate>:准确<minor_inaccurate>:轻微不准确<major_inaccurate>:主要不准确
结束标志
- 模型完成标签标注后,会发出停止序列
<stop>表示结束。



