Shareable Artificial Intelligence to Extract Cancer Outcomes from Electronic Health Records for Precision Oncology Research
收藏physionet.org2025-01-21 收录
下载链接:
https://physionet.org/content/dfci-cancer-outcomes-ehr/1.0.0/
下载链接
链接失效反馈官方服务:
资源简介:
Databases that link molecular data to clinical outcomes can inform precision cancer research into novel prognostic and predictive biomarkers. However, outside of clinical trials, cancer outcomes are typically recorded only in text form within electronic health records (EHRs). Artificial intelligence (AI) models have been trained to extract outcomes from individual EHRs. However, patient privacy restrictions have historically precluded dissemination of these models beyond the centers at which they were trained. In this study, the vulnerability of text classification models trained directly on protected health information to membership inference attacks was confirmed. A teacher-student distillation approach was applied to develop shareable models for annotating outcomes from imaging reports and medical oncologist notes. ‘Teacher’ models trained on EHR data from Dana-Farber Cancer Institute (DFCI) were used to label imaging reports and discharge summaries from the Medical Information Mart for Intensive Care (MIMIC)-IV dataset. ‘Student’ models were trained to use these MIMIC documents to predict the labels assigned by teacher models and sent to Memorial Sloan Kettering (MSK) for evaluation. The student models exhibited high discrimination across outcomes in both the DFCI and MSK test sets. These student models, “DFCI-imaging-student” and “DFCI-medonc-student,” are shared here.
数据库将分子数据与临床结果相连接,能够为精准癌症研究提供新的预后和预测生物标志物。然而,在临床试验之外,癌症结果通常仅在电子健康记录(EHR)中以文本形式记录。人工智能(AI)模型已被训练以从单个EHR中提取结果。然而,患者隐私限制历史上阻止了这些模型在其训练中心之外传播。在本研究中,直接在受保护的健康信息上训练的文本分类模型对成员推断攻击的脆弱性得到了证实。采用师生蒸馏方法开发出可共享的模型,用于标注影像报告和医学肿瘤学家笔记中的结果。‘教师’模型基于达纳-法伯癌症研究所(DFCI)的EHR数据进行了训练,用于标注来自重症监护医疗信息市场(MIMIC)-IV数据集的影像报告和出院总结。‘学生’模型被训练使用这些MIMIC文档来预测教师模型分配的标签,并将其发送至纪念斯隆凯特琳(MSK)进行评估。这些学生模型在DFCI和MSK测试集中均表现出对结果的高区分度。这些学生模型,即“DFCI-imaging-student”和“DFCI-medonc-student”,在此处共享。
提供机构:
physionet.org



