Shareable Artificial Intelligence to Extract Cancer Outcomes from Electronic Health Records for Precision Oncology Research
收藏DataCite Commons2024-10-24 更新2025-04-16 收录
下载链接:
https://physionet.org/content/dfci-cancer-outcomes-ehr/
下载链接
链接失效反馈官方服务:
资源简介:
Databases that link molecular data to clinical outcomes can inform precision
cancer research into novel prognostic and predictive biomarkers. However,
outside of clinical trials, cancer outcomes are typically recorded only in
text form within electronic health records (EHRs). Artificial intelligence
(AI) models have been trained to extract outcomes from individual EHRs.
However, patient privacy restrictions have historically precluded
dissemination of these models beyond the centers at which they were trained.
In this study, the vulnerability of text classification models trained
directly on protected health information to membership inference attacks was
confirmed. A teacher-student distillation approach was applied to develop
shareable models for annotating outcomes from imaging reports and medical
oncologist notes. 'Teacher' models trained on EHR data from Dana-Farber Cancer
Institute (DFCI) were used to label imaging reports and discharge summaries
from the Medical Information Mart for Intensive Care (MIMIC)-IV dataset.
'Student' models were trained to use these MIMIC documents to predict the
labels assigned by teacher models and sent to Memorial Sloan Kettering (MSK)
for evaluation. The student models exhibited high discrimination across
outcomes in both the DFCI and MSK test sets. These student models, "DFCI-
imaging-student" and "DFCI-medonc-student," are shared here.
提供机构:
PhysioNet
创建时间:
2024-10-15



