HaluBench

Name: HaluBench
Creator: maas
Published: 2025-12-05 16:35:22
License: 暂无描述

魔搭社区2025-12-05 更新2025-05-24 收录

下载链接：

https://modelscope.cn/datasets/PatronusAI/HaluBench

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for HaluBench ## Dataset Details HaluBench is a hallucination evaluation benchmark of 15k samples that consists of Context-Question-Answer triplets annotated for whether the examples contain hallucinations. Compared to prior datasets, HaluBench is the first open-source benchmark containing hallucination tasks sourced from real-world domains that include finance and medicine. We sourced examples from several existing QA datasets to build the hallucination evaluation benchmark. We constructed tuples of (question, context, answer, label), where label is a binary score that denotes whether the answer contains a hallucination. The examples are sourced from and constructed using existing datasets such as FinanceBench, PubmedQA, CovidQA, HaluEval, DROP and RAGTruth. - **Curated by:** Patronus AI - **Language(s) (NLP):** English ## Use HaluBench can be used to evaluate hallucination detection models. [The PatronusAI/Llama-3-Patronus-Lynx-70B-Instruct](https://huggingface.co/PatronusAI/Llama-3-Patronus-Lynx-70B-Instruct) outperforms GPT-4o, Claude-Sonnet and other open source models on HaluBench. [PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct](https://huggingface.co/PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct) is a 8B variant that has only a ~3% gap compared to GPT-4o. ## Dataset Card Contact [@sunitha-ravi](https://huggingface.co/sunitha-ravi)

# HaluBench 数据集卡片 ## 数据集详情 HaluBench是一款包含15000条样本的幻觉评估基准数据集，由标注了样本是否存在幻觉的上下文-问题-答案（Context-Question-Answer）三元组构成。与此前的同类数据集相比，HaluBench是首个涵盖金融、医疗等真实领域幻觉评估任务的开源基准数据集。本数据集从多个现有问答（QA）数据集采集样本以构建幻觉评估基准，我们构建了（问题、上下文、答案、标签）四元组（question, context, answer, label），其中标签为二元分值，用于标注答案是否包含幻觉。样本采集自并基于FinanceBench、PubmedQA、CovidQA、HaluEval、DROP及RAGTruth等现有数据集构建而成。 - **数据集整理方：** Patronus AI - **自然语言处理（NLP）支持语言：** 英语 ## 应用场景 HaluBench可用于幻觉检测模型的性能评估。[PatronusAI/Llama-3-Patronus-Lynx-70B-Instruct](https://huggingface.co/PatronusAI/Llama-3-Patronus-Lynx-70B-Instruct)在HaluBench基准上的性能优于GPT-4o、Claude-Sonnet及其他开源模型。[PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct](https://huggingface.co/PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct)为8B参数量版本，与GPT-4o的性能差距仅约3%。 ## 数据集卡片联系人 [@sunitha-ravi](https://huggingface.co/sunitha-ravi)

提供机构：

maas

创建时间：

2025-05-20

搜集汇总

数据集介绍