Annotation dataset of social determinants of health from MIMIC-III Clinical Care Database
收藏physionet.org2025-01-22 收录
下载链接:
https://physionet.org/content/annotation-dataset-sdoh/1.0.0/
下载链接
链接失效反馈官方服务:
资源简介:
Social determinants of health (SDoH) have an important impact on patient outcomes but are incompletely collected from the electronic health records (EHR). This study researched the ability of large language models to extract SDoH from free text in EHRs, where they are most commonly documented, and explored the role of synthetic clinical text for improving the extraction of these scarcely documented, yet extremely valuable, clinical data. We developed annotation guidelines for sentence-level annotation of SDoH that are not reliably available as structured data in the EHR: employment, housing, transportation, parental status, relationship, and social support. Sentences were labeled for both the presence of an SDoH mention and the presence of an adverse SDoH mention. After finalizing the annotation guidelines, two annotators manually annotated a separate corpus, which cannot be released due to PHI. A total of 300/800 (37.5%) of these notes underwent dual annotation. Before adjudication, dually-annotated notes had a Krippendorf’s alpha agreement of 0.86 and Cohen’s Kappa of 0.86 for any SDoH mention categories. For adverse SDoH mentions, notes had a Krippendorf’s alpha agreement of 0.76 and Cohen’s Kappa of 0.76. As an external validation, 200 notes from MIMIC-III written by physicians, social workers, and nurses were manually annotated by a single annotator. Here, we release this manually annotated corpus of 200 MIMC-III notes.
健康的社会决定因素(SDoH)对患者的预后具有重要影响,但这些因素在电子健康记录(EHR)中的收集并不完整。本研究探讨了大型语言模型从EHR中的自由文本中提取SDoH的能力,尤其是在这些因素最常被记录的领域,并探究了合成临床文本在提高这些难以记录但极具价值的临床数据提取中的作用。我们制定了针对SDoH的句子级标注指南,这些因素在EHR中作为结构化数据并不可靠地可用:(就业、住房、交通、父母状态、关系和社会支持)。句子被标注为是否提及SDoH以及是否提及负面SDoH。在最终确定标注指南后,两名标注员对单独的语料库进行了手动标注,但由于涉及个人健康信息(PHI),该语料库无法发布。在这些笔记中,共有300/800(37.5%)的笔记接受了双重标注。在评审之前,双重标注的笔记在任一SDoH提及类别上,Krippendorf的α一致性为0.86,Cohen的Kappa系数为0.86。作为外部验证,由医生、社会工作者和护士撰写的200份MIMIC-III笔记由一名标注员进行了手动标注。在此,我们发布了这200份MIMIC-III笔记的手动标注语料库。
提供机构:
physionet.org



