MedVAL-Bench: Expert-Annotated Medical Text Validation Benchmark
收藏DataCite Commons2025-11-04 更新2026-05-04 收录
下载链接:
https://physionet.org/content/medval-bench/1.0.0/
下载链接
链接失效反馈官方服务:
资源简介:
MedVAL-Bench is a dataset containing physician evaluations of errors in
language model (LM)-generated medical text. The dataset spans 6 diverse
medical text generation tasks and includes annotations from 12 physicians on
clinically significant errors for 840 LM-generated outputs. These text-to-text
generation tasks involve transforming an input medical text into an output
relevant to a specific use case. Each task includes inputs and corresponding
LM-generated outputs, which are evaluated for factual consistency by
physicians. Importantly, the MedVAL framework and dataset are designed to rely
only on inputs for the evaluation process to allow working with datasets that
may not have reference outputs, ensuring broad applicability. The evaluation
process aims to determine whether the output is factually consistent with the
input and is safe for use. MedVAL-Bench constitutes the first large-scale
physician-validated benchmark with triage-style risk grading aligned to real-
world clinical decision-making, supporting the development of automated,
expert-aligned evaluation methods and facilitating research toward trustworthy
medical text generation.
提供机构:
PhysioNet
创建时间:
2025-10-29



