five

Comparison of LLM Hallucination Benchmarks

收藏
DataCite Commons2024-09-19 更新2025-04-16 收录
下载链接:
https://orkg.org/comparison/R737435
下载链接
链接失效反馈
官方服务:
资源简介:
This comparison includes the most representative state-of-the-art (SOTA) LLM Hallucination Evaluation and Detection Benchmarks as of the end of 2023, based on the survey "A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions" by Lei Huang et. al, focusing, and including each respective benchmark containing or referring: 1) Hallucination Evaluation benchmarks which are devised to assess LLMs’ tendency to produce hallucinations, with a particular emphasis on identifying factual inaccuracies (i.e. Factuality hallucinations) and measuring deviations from original contexts (i.e. Faithfulness hallucinations) and 2) Hallucination Detection Benchmark, which most studies have primarily concentrated on task-specific hallucinations, such as abstractive summarization, data-to-text, and machine translation. Factuality hallucinations in LLMs refer to instances where the generated content contains factual inaccuracies, inconsistencies, or fabrications that deviate from reality or established knowledge. These hallucinations can be categorized into two primary types: factual inconsistency, where the output includes contradictory facts, and factual fabrication, where the content contains unsupported or fictional information. On the other hand, Faithfulness hallucinations in LLMs refer to instances where the model's outputs deviate from user instructions or provided contextual information. They are subdivided into three subtypes: Instruction inconsistency, Context inconsistency, and Logical inconsistency.
提供机构:
Open Research Knowledge Graph
创建时间:
2024-09-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作