MIMIC-IV-Ext-MedicalBench: Evaluating Large Language Models Towards Improved Medical Concept Extraction

Name: MIMIC-IV-Ext-MedicalBench: Evaluating Large Language Models Towards Improved Medical Concept Extraction
Creator: PhysioNet
Published: 2026-03-24 00:34:22
License: 暂无描述

DataCite Commons2026-03-24 更新2026-05-04 收录

下载链接：

https://physionet.org/content/mimic-iv-ext-medicalbench/1.0.0/

下载链接

链接失效反馈

官方服务：

资源简介：

Medical concept extraction from electronic health records underpins many downstream applications, yet remains challenging because medically meaningful concepts, such as diagnosis, are frequently implied rather than explicitly stated in medical narratives. Existing benchmarks with human-annotated evidence spans underscore the importance of grounding extracted concepts in medical text. However, they predominantly focus on explicitly stated concepts and provide limited coverage of cases in which medically relevant concepts must be inferred. We present MedicalBench, a new benchmark for medical concept extraction with evidence grounding that evaluates implicit medical reasoning. MedicalBench formulates concept extraction as a verification task over medical note-concept pairs, coupled with sentence-level evidence identification. Built from MIMIC-IV discharge summaries and human-verified ICD-10 codes, the dataset is curated through a multi-stage large language model (LLM) triage pipeline followed by dual medical annotation and expert review. It deliberately includes implicit positives, semantically confusable negatives, and cases where LLM judgments disagree with human assessments. Annotators provide sentence-level evidence spans and concise medical rationales. In total, the dataset contains 405 high-quality examples, covering a broad range of ICD-10 chapters. By providing ground-truth evidence and confusable alternatives, MedicalBench enables rigorous evaluation of not only _whether_ a model can extract the correct concept, but also _why_ -- rewarding solutions that can highlight relevant evidence and reject plausible-but-incorrect diagnosis and procedures.

提供机构：

PhysioNet

创建时间：

2026-03-18

5,000+

优质数据集

54 个

任务类型

进入经典数据集