CLIP (CLIP: A Dataset for Extracting Action Items for Physicians from Hospital Discharge Notes)
收藏OpenDataLab2026-05-31 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/CLIP
下载链接
链接失效反馈官方服务:
资源简介:
我们创建了一个在 MIMIC-III 上注释的临床行动项目数据集。这个数据集,我们称之为 CLIP,由医生注释,涵盖 718 个出院摘要,代表 107,494 个句子。在应用代理生成以使用伪造数据填充来自 MIMIC-III 文本的匿名模板后,将注释收集为字符级跨度以排出摘要。我们发布这些跨度,它们聚合成句子级标签,以及用于聚合跨度和标签句子的句子标记器。我们还发布了代理数据生成器,以及用于训练、验证和测试拆分的文档 ID,以实现复制。跨度用 7 种不同类型的 0 个或多个标签进行注释,代表可能需要采取的不同操作:预约、实验室、程序、药物、成像、患者说明和其他。我们鼓励社区使用此数据集开发从出院摘要中自动提取临床行动项目的方法。
We constructed a dataset of annotated clinical action items based on MIMIC-III. Named CLIP, this dataset is annotated by physicians and covers 718 discharge summaries representing 107,494 sentences. After applying agent-based generation to fill anonymized templates derived from MIMIC-III text with synthetic data, annotations were collected as character-level spans across the discharge summaries. We release these spans, which are aggregated into sentence-level labels, along with the sentence tokenizer used for aggregating the spans and labeling sentences. We also release the agent-based data generator, as well as the document IDs for the train, validation, and test splits to enable reproducibility. Each span is annotated with 0 or more labels from 7 distinct categories representing different potential clinical actions: appointment, laboratory test, procedure, medication, imaging, patient instruction, and other. We encourage the community to use this dataset to develop automated methods for extracting clinical action items from discharge summaries.
提供机构:
OpenDataLab
创建时间:
2022-08-16
搜集汇总
数据集介绍

背景与挑战
背景概述
CLIP 是一个基于 MIMIC-III 注释的临床行动项提取数据集,涵盖 718 份出院摘要中的 107,494 个句子,由医生标注了7种行动类型。该数据集提供字符级跨度、句子级标签及相关工具,旨在支持从出院记录中自动提取临床行动项的方法开发。
以上内容由遇见数据集搜集并总结生成



