five

MedVAL-Bench: Expert-Annotated Medical Text Validation Benchmark

收藏
DataCite Commons2025-11-04 更新2026-05-04 收录
下载链接:
https://physionet.org/content/medval-bench/1.0.0/
下载链接
链接失效反馈
官方服务:
资源简介:
MedVAL-Bench is a dataset containing physician evaluations of errors in language model (LM)-generated medical text. The dataset spans 6 diverse medical text generation tasks and includes annotations from 12 physicians on clinically significant errors for 840 LM-generated outputs. These text-to-text generation tasks involve transforming an input medical text into an output relevant to a specific use case. Each task includes inputs and corresponding LM-generated outputs, which are evaluated for factual consistency by physicians. Importantly, the MedVAL framework and dataset are designed to rely only on inputs for the evaluation process to allow working with datasets that may not have reference outputs, ensuring broad applicability. The evaluation process aims to determine whether the output is factually consistent with the input and is safe for use. MedVAL-Bench constitutes the first large-scale physician-validated benchmark with triage-style risk grading aligned to real- world clinical decision-making, supporting the development of automated, expert-aligned evaluation methods and facilitating research toward trustworthy medical text generation.
提供机构:
PhysioNet
创建时间:
2025-10-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作