Med-HALT

arXiv2025-09-30 收录

下载链接：

https://medhalt.github.io

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为Med-HALT，它提供了一个多元化的跨国数据集，这些数据来自不同国家的医学考试，旨在评估并减少大型语言模型（LLM）中的幻觉现象。该数据集包含了来自印度、西班牙、美国和台湾的多样化的医学入学考试题目，涵盖了各种推理类型，如事实性、诊断性和多跳推理。具体规模方面，该数据集为反向幻觉任务（RHT）提供了18,866个样本，为多跳幻觉任务（MHT）提供了4,916个样本，其核心任务是评估大型语言模型中的幻觉现象。

The dataset named Med-HALT is a diverse multinational collection of medical examination questions sourced from multiple countries, designed to evaluate and mitigate hallucinations in Large Language Models (LLMs). It encompasses a wide range of medical school admission test questions from India, Spain, the United States, and Taiwan (China), covering diverse reasoning categories including factual, diagnostic, and multi-hop reasoning. Regarding its scale, this dataset provides 18,866 samples for the Reverse Hallucination Task (RHT) and 4,916 samples for the Multi-hop Hallucination Task (MHT), whose core objective is to assess hallucinations in large language models.

5,000+

优质数据集

54 个

任务类型

进入经典数据集