Comparison of LLM Hallucination Benchmarks

Name: Comparison of LLM Hallucination Benchmarks
Creator: Open Research Knowledge Graph
Published: 2024-09-19 00:27:00
License: 暂无描述

DataCite Commons2024-09-19 更新2025-04-16 收录

下载链接：

https://orkg.org/comparison/R737437

下载链接

链接失效反馈

官方服务：

资源简介：

This comparison includes the most representative state-of-the-art (SOTA) LLM Hallucination Evaluation and Detection Benchmarks as of the end of 2023, based on the survey "A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions" by Lei Huang et. al, focusing, and including each respective benchmark containing or referring: 1) Hallucination Evaluation benchmarks which are devised to assess LLMs’ tendency to produce hallucinations, with a particular emphasis on identifying factual inaccuracies (i.e. Factuality hallucinations) and measuring deviations from original contexts (i.e. Faithfulness hallucinations) and 2) Hallucination Detection Benchmark, which most studies have primarily concentrated on task-specific hallucinations, such as abstractive summarization, data-to-text, and machine translation. Factuality hallucinations in LLMs refer to instances where the generated content contains factual inaccuracies, inconsistencies, or fabrications that deviate from reality or established knowledge. These hallucinations can be categorized into two primary types: factual inconsistency, where the output includes contradictory facts, and factual fabrication, where the content contains unsupported or fictional information. On the other hand, Faithfulness hallucinations in LLMs refer to instances where the model's outputs deviate from user instructions or provided contextual information. They are subdivided into three subtypes: Instruction inconsistency, Context inconsistency, and Logical inconsistency.

提供机构：

Open Research Knowledge Graph

创建时间：

2024-09-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集