five

Annotation dataset of problematic opioid use and related contexts from MIMIC-III Critical Care Database discharge summaries

收藏
physionet.org2025-03-24 收录
下载链接:
https://physionet.org/content/annotation-opioid-use-notes/1.0.0/
下载链接
链接失效反馈
官方服务:
资源简介:
Opioid use disorder (OUD) is underdiagnosed in health system settings, limiting research on OUD using electronic health records (EHRs). Medical encounter notes can enrich structured EHR data with documented signs and symptoms of OUD and social risks and behaviors. To capture this information at scale, natural language processing tools must be developed and evaluated. We conducted a pilot study that aimed to 1) develop and apply an annotation schema to deeply characterize OUD and related clinical, behavioral, and environmental factors; and 2) automate the annotation schema using machine learning and deep learning-based approaches. De-identified patient data for this study included hospital discharge summaries of patients with International Classification of Diseases (ICD-9) OUD diagnostic codes, obtained from the MIMIC-III Critical Care Database. We developed an annotation schema to characterize problematic opioid use, identify individuals with potential OUD, and provide psychosocial context. The final annotation schema contained 33 classes. Two annotators reviewed discharge summaries from a random sample of 100 of these patients. The first corpus of 40 patients was reviewed by both annotators. We achieved moderate inter-annotator agreement, with F1-scores across all classes increasing from 48% to 66%. The second corpus of 60 patients was reviewed by a single annotator. The shared database contains the resulting 3,270 annotations with the note identifier, span offset with accompanying text snippet, and class assignments and may be useful to future development of natural language processing systems related to OUD.

阿片类药物滥用障碍(Opioid Use Disorder,简称OUD)在医疗体系设置中常被低估诊断,这限制了利用电子健康记录(Electronic Health Records,简称EHRs)进行OUD研究的进展。医疗接触记录能够通过记录OUD的症状和体征以及社会风险和行为,丰富结构化的EHR数据。为了大规模捕捉此类信息,必须开发和评估自然语言处理工具。本研究开展了一项试点研究,旨在1)开发并应用一个标注方案,以深入描述OUD及其相关的临床、行为和环境因素;2)利用机器学习和深度学习方法自动化标注方案。本研究中匿名化的患者数据包括来自MIMIC-III重症监护数据库的患者出院总结,其中包含国际疾病分类(International Classification of Diseases,简称ICD-9)OUD诊断代码。我们开发了一个标注方案,以描述问题性阿片类药物使用、识别潜在OUD个体以及提供心理社会背景。最终的标注方案包含33个类别。两位标注者审查了100位随机样本患者的出院总结。前40位患者的语料库由两位标注者共同审查。我们实现了中等程度的标注者间一致性,所有类别的F1分数从48%提升至66%。第二组60位患者的语料库由一位标注者审查。共享数据库包含了由此产生的3,270个标注,包括笔记标识符、文本片段的跨度偏移和类别分配,这可能对未来与OUD相关的自然语言处理系统的开发有所帮助。
提供机构:
physionet.org
二维码
社区交流群
二维码
科研交流群
商业服务