MedVAL-Bench: Expert-Annotated Medical Text Validation Benchmark

Name: MedVAL-Bench: Expert-Annotated Medical Text Validation Benchmark
Creator: PhysioNet
Published: 2025-11-04 01:32:17
License: 暂无描述

DataCite Commons2025-11-04 更新2026-05-04 收录

下载链接：

https://physionet.org/content/medval-bench/1.0.0/

下载链接

链接失效反馈

官方服务：

资源简介：

MedVAL-Bench is a dataset containing physician evaluations of errors in language model (LM)-generated medical text. The dataset spans 6 diverse medical text generation tasks and includes annotations from 12 physicians on clinically significant errors for 840 LM-generated outputs. These text-to-text generation tasks involve transforming an input medical text into an output relevant to a specific use case. Each task includes inputs and corresponding LM-generated outputs, which are evaluated for factual consistency by physicians. Importantly, the MedVAL framework and dataset are designed to rely only on inputs for the evaluation process to allow working with datasets that may not have reference outputs, ensuring broad applicability. The evaluation process aims to determine whether the output is factually consistent with the input and is safe for use. MedVAL-Bench constitutes the first large-scale physician-validated benchmark with triage-style risk grading aligned to real- world clinical decision-making, supporting the development of automated, expert-aligned evaluation methods and facilitating research toward trustworthy medical text generation.

提供机构：

PhysioNet

创建时间：

2025-10-29

5,000+

优质数据集

54 个

任务类型

进入经典数据集