five

TruthEval

收藏
arXiv2024-06-04 更新2024-06-21 收录
下载链接:
https://github.com/tanny411/TruthEval
下载链接
链接失效反馈
官方服务:
资源简介:
TruthEval数据集是由滑铁卢大学计算机科学学院创建,旨在评估大型语言模型(LLM)的真实性和可靠性。该数据集包含885条关于敏感话题的陈述,分为六个类别:事实、阴谋论、争议、误解、刻板印象和虚构。这些陈述经过手工筛选,包含已知真实值,用于区分LLM的能力和随机性。数据集的创建过程涉及从维基百科、GPT-3等来源收集陈述,并通过语义去重技术进行精炼。TruthEval数据集主要用于评估LLM在理解世界知识方面的能力,特别是在处理具有明确真实值和模糊真实值的话题时的表现。

The TruthEval dataset was developed by the School of Computer Science at the University of Waterloo, with the core objective of evaluating the authenticity and reliability of Large Language Models (LLMs). It contains 885 statements related to sensitive topics, which are categorized into six groups: factual content, conspiracy theories, controversial topics, misconceptions, stereotypes, and fictional works. All these statements are manually curated with known ground truth labels, designed to distinguish between the genuine capabilities of LLMs and random responses. The construction process of the dataset entails collecting statements from sources including Wikipedia, GPT-3 and other resources, followed by refinement via semantic deduplication techniques. The TruthEval dataset is primarily utilized to assess LLMs' proficiency in comprehending world knowledge, especially their performance when handling topics with both explicit and ambiguous ground truth values.
提供机构:
滑铁卢大学计算机科学学院
创建时间:
2024-06-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作