five

RadCoref: Fine-tuning coreference resolution for different styles of clinical narratives

收藏
DataCite Commons2024-01-30 更新2024-07-13 收录
下载链接:
https://physionet.org/content/rad-coreference-resolution/1.0.0/
下载链接
链接失效反馈
官方服务:
资源简介:
RadCoref is a small subset of MIMIC-CXR with manually annotated coreference mentions and clusters. The dataset is annotated by a panel of three cross- disciplinary experts with experience in clinical data processing following the i2b2 annotation scheme with minimum modification. The dataset consists of Findings and Impression sections extracted from full radiology reports. The dataset has 950, 25 and 200 section documents for training, validation, and testing, respectively. The training and validation sets are annotated by one annotator. The test set is annotated by two human annotators independently, of which the results are merged manually by the third annotator. The dataset aims to support the task of coreference resolution on radiology reports. Given that the MIMIC-CXR has been de-identified already, no protected health information (PHI) is included.

RadCoref 是 MIMIC-CXR 的一个小子集,包含经人工标注的共指提及与共指簇。该数据集由三名具备临床数据处理经验的跨学科专家组成的小组,遵循 i2b2 标注方案进行标注,仅做了最小程度的修改。数据集涵盖从完整放射学报告中提取的发现(Findings)与印象(Impression)两个章节。训练集、验证集与测试集分别包含950、25和200份章节文档。其中训练集与验证集由一名标注员完成标注;测试集由两名人类标注员独立标注,最终由第三名标注员手动合并两份标注结果。本数据集旨在支持放射学报告的共指消解任务。鉴于 MIMIC-CXR 已完成去标识化处理,数据集未包含任何受保护健康信息(Protected Health Information,PHI)。
提供机构:
PhysioNet
创建时间:
2024-01-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作