clam004/antihallucination_dataset

Name: clam004/antihallucination_dataset
Creator: clam004
Published: 2024-04-10 22:53:41
License: 暂无描述

Hugging Face2024-04-10 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/clam004/antihallucination_dataset

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集的结构如下：每个样本以特殊序列`<truth>`开始，后面跟随一个真实参考文档，该文档是最高准确度的文本，并以`<generated>`结束。接着是生成的文本，该文本可能包含幻觉，以`<generated>`开始，以`<eval>`结束。模型会学习重复生成的文本，并在每个段落的末尾标记其准确性，准确性分为三个级别：`<accurate>`、`<minor_inaccurate>`和`<major_inaccurate>`。最后，模型在完成标记后会发出停止序列`<stop>`。

The structure of this dataset is as follows: Each sample starts with the special sequence `<truth>`, followed by a gold reference document—the text of the highest accuracy—which ends with the sequence `<generated>`. Next comes the generated text, which may contain hallucinations, starting with `<generated>` and ending with `<eval>`. The model is trained to reproduce the generated text and annotate its accuracy at the end of each paragraph, with the accuracy categorized into three levels: `<accurate>`, `<minor_inaccurate>`, and `<major_inaccurate>`. Finally, the model will output the stop sequence `<stop>` once the annotation is completed.

提供机构：

clam004

原始信息汇总

数据集概述

数据结构

每个样本以特殊序列 <truth> 开始，随后是作为最高准确度参考文档的文本，以 <generated> 结束。
生成文本紧随 <generated> 标签之后，可能包含幻觉内容，并以 <eval> 结束。
模型将学习重复生成文本，并在每个段落末尾添加标签，表示该段落的准确性级别。

准确性标签

<accurate>：准确
<minor_inaccurate>：轻微不准确
<major_inaccurate>：主要不准确

结束标志

模型完成标签标注后，会发出停止序列 <stop> 表示结束。

5,000+

优质数据集

54 个

任务类型

进入经典数据集