TruthEval: A Dataset to Evaluate LLM Truthfulness and Reliability
收藏DataCite Commons2025-11-20 更新2025-04-09 收录
下载链接:
https://borealisdata.ca/citation?persistentId=doi:10.5683/SP3/5MZWBV
下载链接
链接失效反馈官方服务:
资源简介:
Large Language Model (LLM) evaluation is currently one of the most important areas of research, with existing benchmarks proving to be insufficient and not completely representative of LLMs' various capabilities. We present a curated collection of challenging statements on sensitive topics for LLM benchmarking called TruthEval. These statements were curated by hand and contain known truth values. The categories were chosen to distinguish LLMs' abilities from their stochastic nature. Details of collection method and use cases can be found in this paper: <a href="https://arxiv.org/abs/2406.01855">TruthEval: A Dataset to Evaluate LLM Truthfulness and Reliability</a>
提供机构:
Borealis
创建时间:
2024-07-29



