Comparison of LLM Hallucination Benchmarks
收藏DataCite Commons2024-09-19 更新2025-04-16 收录
下载链接:
https://orkg.org/comparison/R737437
下载链接
链接失效反馈官方服务:
资源简介:
This comparison includes the most representative state-of-the-art (SOTA) LLM Hallucination Evaluation and Detection Benchmarks as of the end of 2023, based on the survey "A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions" by Lei Huang et. al, focusing, and including each respective benchmark containing or referring: 1) Hallucination Evaluation benchmarks which are devised to assess LLMs’ tendency to produce hallucinations, with a particular emphasis on identifying factual inaccuracies (i.e. Factuality hallucinations) and measuring deviations from original contexts (i.e. Faithfulness hallucinations) and 2) Hallucination Detection Benchmark, which most studies have primarily concentrated on task-specific hallucinations, such as abstractive summarization, data-to-text, and machine translation.
Factuality hallucinations in LLMs refer to instances where the generated content contains factual inaccuracies, inconsistencies, or fabrications that deviate from reality or established knowledge. These hallucinations can be categorized into two primary types: factual inconsistency, where the output includes contradictory facts, and factual fabrication, where the content contains unsupported or fictional information. On the other hand, Faithfulness hallucinations in LLMs refer to instances where the model's outputs deviate from user instructions or provided contextual information. They are subdivided into three subtypes: Instruction inconsistency, Context inconsistency, and Logical inconsistency.
提供机构:
Open Research Knowledge Graph
创建时间:
2024-09-19



