CLIP: A Dataset for Extracting Action Items for Physicians from Hospital Discharge Notes
收藏physionet.org2025-03-22 收录
下载链接:
https://physionet.org/content/mimic-iii-clinical-action/1.0.0/
下载链接
链接失效反馈官方服务:
资源简介:
We created a dataset of clinical action items annotated over MIMIC-III. This dataset, which we call CLIP, is annotated by physicians and covers 718 discharge summaries, representing 107,494 sentences. Annotations were collected as character-level spans to discharge summaries after applying surrogate generation to fill in the anonymized templates from MIMIC-III text with faked data. We release these spans, their aggregation into sentence-level labels, and the sentence tokenizer used to aggregate the spans and label sentences. We also release the surrogate data generator, and the document IDs used for training, validation, and test splits, to enable reproduction. The spans are annotated with 0 or more labels of 7 different types, representing the different actions that may need to be taken: Appointment, Lab, Procedure, Medication, Imaging, Patient Instructions, and Other. We encourage the community to use this dataset to develop methods for automatically extracting clinical action items from discharge summaries.
本团队构建了一个基于MIMIC-III数据库的病历行动项数据集,称之为CLIP。该数据集由医师进行标注,涵盖718份出院小结,共计107,494个句子。标注过程通过对MIMIC-III文本中的匿名模板进行代理生成,以伪造数据填充,从而形成出院小结的字符级跨度。我们发布了这些跨度、将跨度聚合为句子级标签的方法以及用于聚合跨度并标注句子的句子分词器。此外,我们还发布了代理数据生成器以及用于训练、验证和测试集划分的文档标识符,以促进数据集的复现。这些跨度被标注为0个或多个标签,共计7种类型,分别代表可能需要采取的不同行动:预约、实验室检查、程序、药物、影像学检查、患者指导及其他。我们鼓励学术界和工业界利用此数据集开发自动从出院小结中提取临床行动项的方法。
提供机构:
physionet.org



