RadGraph-XL: A Large-Scale Expert-Annotated Dataset for Entity and Relation Extraction from Radiology Reports
收藏DataCite Commons2025-09-12 更新2026-05-04 收录
下载链接:
https://physionet.org/content/radgraph-xl/1.0.0/
下载链接
链接失效反馈官方服务:
资源简介:
Radiology reports are essential for clinical care but pose challenges for
automated processing due to their unstructured nature. Existing datasets like
RadGraph-1.0 focus narrowly on chest X-rays (CXR), limiting their
applicability. We introduce RadGraph-XL, a large-scale, expert-annotated
dataset of 2,300 radiology reports with over 410,000 labeled entities and
relations, spanning four anatomy-modality pairs: chest computed tomography
(CT), abdomen/pelvis CT, brain magnetic resonance imaging (MR), and CXR.
Each report is annotated by board-certified radiologists using a detailed
schema that captures observations, anatomical references, and their
relationships. A novel post-processing step identifies measurement-related
entities, a clinically valuable category. Trained models using RadGraph-XL
outperform prior methods and GPT-4, and generalize well to out-of-domain data
such as deep vein thrombosis (DVT) ultrasound reports.
RadGraph-XL is released publicly with models and annotations to support
applications in clinical natural language processing (NLP), medical imaging
artificial intelligence, and foundation model evaluation, setting a new
benchmark for structured information extraction in radiology.
放射学报告是临床诊疗不可或缺的核心资料,但因其非结构化的天然属性,给自动化处理带来了诸多挑战。现有诸如RadGraph-1.0的数据集仅聚焦于胸部X射线(Chest X-rays, CXR),极大限制了其应用范围。
我们推出了RadGraph-XL,这是一个大规模、经专家标注的数据集,包含2300份放射学报告,共计超过41万个标注实体与关系,涵盖四大解剖-模态组合:胸部计算机断层扫描(Computed Tomography, CT)、腹部/盆腔CT、脑部磁共振成像(Magnetic Resonance Imaging, MR)以及胸部X射线(Chest X-rays, CXR)。
每份报告均由经执业认证的放射科医师基于详细标注框架完成标注,该框架可捕获观测结果、解剖学参照对象及其相互关系。我们新增了一项创新性后处理步骤,用于识别与测量相关的实体——这是一类具有重要临床价值的实体类别。
基于RadGraph-XL训练的模型性能优于此前的各类方法与GPT-4,且可良好泛化至域外数据,例如深静脉血栓形成(Deep Vein Thrombosis, DVT)超声报告。
RadGraph-XL已与配套模型及标注文件公开发布,旨在支撑临床自然语言处理(Natural Language Processing, NLP)、医学影像人工智能以及基础模型评估领域的相关应用,为放射学领域的结构化信息抽取任务树立了全新基准。
提供机构:
PhysioNet
创建时间:
2025-08-29



