UTAustin-AIHealth/MedHallu
收藏Hugging Face2025-02-21 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/UTAustin-AIHealth/MedHallu
下载链接
链接失效反馈官方服务:
资源简介:
MedHallu是一个旨在评估大型语言模型在医疗问答领域关键任务中检测幻觉能力的全面基准数据集。该数据集包括两个不同的部分:pqa_labeled和pqa_artificial。pqa_labeled包含1000个来自PubMedQA的高质量样本,而pqa_artificial包含9000个通过自动化管道生成的样本。每个样本都包括一个医疗问题及其对应的真实答案、生成的答案、难度等级和幻觉类别。
MedHallu is a comprehensive benchmark dataset designed to evaluate the ability of large language models to detect hallucinations in medical question-answering tasks. The dataset includes two distinct splits: pqa_labeled with 1,000 high-quality samples from PubMedQA pqa_labeled split, and pqa_artificial with 9,000 samples generated from PubMedQA pqa_artificial split. Each sample consists of a medical question along with its corresponding ground truth answer, hallucinated answer, difficulty level, and the category of hallucination.
提供机构:
UTAustin-AIHealth



