TruthEval: A Dataset to Evaluate LLM Truthfulness and Reliability

Name: TruthEval: A Dataset to Evaluate LLM Truthfulness and Reliability
Creator: Borealis
Published: 2025-11-20 06:51:15
License: 暂无描述

DataCite Commons2025-11-20 更新2025-04-09 收录

下载链接：

https://borealisdata.ca/citation?persistentId=doi:10.5683/SP3/5MZWBV

下载链接

链接失效反馈

官方服务：

资源简介：

Large Language Model (LLM) evaluation is currently one of the most important areas of research, with existing benchmarks proving to be insufficient and not completely representative of LLMs' various capabilities. We present a curated collection of challenging statements on sensitive topics for LLM benchmarking called TruthEval. These statements were curated by hand and contain known truth values. The categories were chosen to distinguish LLMs' abilities from their stochastic nature. Details of collection method and use cases can be found in this paper: <a href="https://arxiv.org/abs/2406.01855">TruthEval: A Dataset to Evaluate LLM Truthfulness and Reliability</a>

提供机构：

Borealis

创建时间：

2024-07-29

5,000+

优质数据集

54 个

任务类型

进入经典数据集