five

Annotation dataset of problematic opioid use and related contexts from MIMIC-III Critical Care Database discharge summaries

收藏
DataCite Commons2023-02-08 更新2025-04-16 收录
下载链接:
https://physionet.org/content/annotation-opioid-use-notes/
下载链接
链接失效反馈
官方服务:
资源简介:
Opioid use disorder (OUD) is underdiagnosed in health system settings, limiting research on OUD using electronic health records (EHRs). Medical encounter notes can enrich structured EHR data with documented signs and symptoms of OUD and social risks and behaviors. To capture this information at scale, natural language processing tools must be developed and evaluated. We conducted a pilot study that aimed to 1) develop and apply an annotation schema to deeply characterize OUD and related clinical, behavioral, and environmental factors; and 2) automate the annotation schema using machine learning and deep learning-based approaches. De-identified patient data for this study included hospital discharge summaries of patients with _International Classification of Diseases_ (ICD-9) OUD diagnostic codes, obtained from the MIMIC-III Critical Care Database. We developed an annotation schema to characterize problematic opioid use, identify individuals with potential OUD, and provide psychosocial context. The final annotation schema contained 33 classes. Two annotators reviewed discharge summaries from a random sample of 100 of these patients. The first corpus of 40 patients was reviewed by both annotators. We achieved moderate inter-annotator agreement, with F1-scores across all classes increasing from 48% to 66%. The second corpus of 60 patients was reviewed by a single annotator. The shared database contains the resulting 3,270 annotations with the note identifier, span offset with accompanying text snippet, and class assignments and may be useful to future development of natural language processing systems related to OUD.

阿片类使用障碍(Opioid use disorder, OUD)在医疗系统中普遍存在诊断不足的情况,这限制了依托电子健康记录(electronic health records, EHRs)开展的阿片类使用障碍相关研究。医疗就诊记录可为结构化电子健康记录数据补充经文档化记录的阿片类使用障碍相关体征、症状,以及社会风险因素与行为信息。为实现该类信息的规模化获取,亟需开发并评估自然语言处理(Natural Language Processing, NLP)工具。本研究开展一项预试验,旨在达成两大目标:其一,开发并应用标注体系(annotation schema),以深度刻画阿片类使用障碍及其相关临床、行为与环境因素;其二,基于机器学习与深度学习方法实现该标注体系的自动化标注。本研究使用的去标识化患者数据源自MIMIC-III重症监护数据库(MIMIC-III Critical Care Database),涵盖了携带国际疾病分类(International Classification of Diseases, ICD-9)阿片类使用障碍诊断编码的患者的出院小结。我们开发了一套标注体系,用于表征阿片类不当使用情况、识别潜在阿片类使用障碍患者,并补充社会心理背景信息。最终确定的标注体系共包含33个类别。两名标注人员对随机抽取的100名患者的出院小结进行了标注:其中首批包含40名患者出院小结的语料库由两名标注人员共同标注,最终达成了中等程度的标注者间一致性,所有类别的F1值从48%提升至66%;第二批包含60名患者出院小结的语料库仅由单一名标注人员完成标注。本共享数据库包含共计3270条标注结果,每条标注均附带病历标识符、对应文本片段的偏移量及类别分配信息,可用于后续阿片类使用障碍相关自然语言处理系统的开发工作。
提供机构:
PhysioNet
创建时间:
2023-02-05
二维码
社区交流群
二维码
科研交流群
商业服务