five

RadGraph-XL: A Large-Scale Expert-Annotated Dataset for Entity and Relation Extraction from Radiology Reports

收藏
DataCite Commons2025-09-12 更新2026-05-04 收录
下载链接:
https://physionet.org/content/radgraph-xl/1.0.0/
下载链接
链接失效反馈
官方服务:
资源简介:
Radiology reports are essential for clinical care but pose challenges for automated processing due to their unstructured nature. Existing datasets like RadGraph-1.0 focus narrowly on chest X-rays (CXR), limiting their applicability. We introduce RadGraph-XL, a large-scale, expert-annotated dataset of 2,300 radiology reports with over 410,000 labeled entities and relations, spanning four anatomy-modality pairs: chest computed tomography (CT), abdomen/pelvis CT, brain magnetic resonance imaging (MR), and CXR. Each report is annotated by board-certified radiologists using a detailed schema that captures observations, anatomical references, and their relationships. A novel post-processing step identifies measurement-related entities, a clinically valuable category. Trained models using RadGraph-XL outperform prior methods and GPT-4, and generalize well to out-of-domain data such as deep vein thrombosis (DVT) ultrasound reports. RadGraph-XL is released publicly with models and annotations to support applications in clinical natural language processing (NLP), medical imaging artificial intelligence, and foundation model evaluation, setting a new benchmark for structured information extraction in radiology.

放射学报告是临床诊疗不可或缺的核心资料,但因其非结构化的天然属性,给自动化处理带来了诸多挑战。现有诸如RadGraph-1.0的数据集仅聚焦于胸部X射线(Chest X-rays, CXR),极大限制了其应用范围。 我们推出了RadGraph-XL,这是一个大规模、经专家标注的数据集,包含2300份放射学报告,共计超过41万个标注实体与关系,涵盖四大解剖-模态组合:胸部计算机断层扫描(Computed Tomography, CT)、腹部/盆腔CT、脑部磁共振成像(Magnetic Resonance Imaging, MR)以及胸部X射线(Chest X-rays, CXR)。 每份报告均由经执业认证的放射科医师基于详细标注框架完成标注,该框架可捕获观测结果、解剖学参照对象及其相互关系。我们新增了一项创新性后处理步骤,用于识别与测量相关的实体——这是一类具有重要临床价值的实体类别。 基于RadGraph-XL训练的模型性能优于此前的各类方法与GPT-4,且可良好泛化至域外数据,例如深静脉血栓形成(Deep Vein Thrombosis, DVT)超声报告。 RadGraph-XL已与配套模型及标注文件公开发布,旨在支撑临床自然语言处理(Natural Language Processing, NLP)、医学影像人工智能以及基础模型评估领域的相关应用,为放射学领域的结构化信息抽取任务树立了全新基准。
提供机构:
PhysioNet
创建时间:
2025-08-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作