Automated phenotyping of mild cognitive impairment and Alzheimer's disease and related dementias using electronic health records
收藏DataCite Commons2025-09-26 更新2026-02-08 收录
下载链接:
https://bdsp.io/content/zmnzwxzrycx3dop6o9br/
下载链接
链接失效反馈官方服务:
资源简介:
**Objectives: **Unstructured and structured data in electronic health records
(EHR) are a rich source of information for research and quality improvement
studies. However, extracting accurate information from EHR is labor-intensive.
Timely and accurate identification of patients with Alzheimer's Disease,
related dementias (ADRD), or mild cognitive impairment (MCI) is critical for
improving patient outcomes through early intervention, optimizing care plans,
and reducing healthcare system burdens. Here we introduce an automated EHR
phenotyping model to streamline this process and enable efficient
identification of these conditions.
**Methods: **We analyzed data from 3,626 outpatients seen at two hospitals
between February 2015 and June 2022. Through manual chart review, we
established ground truth labels for the presence or absence of MCI/ADRD
diagnoses. Our model combined three types of data: (1) unstructured clinical
notes, from which we extracted single words, two-word phrases (bigrams), and
three-word phrases (trigrams) as features, weighted using Term Frequency-
Inverse Document Frequency (TF-IDF) to capture their relative importance, (2)
International Classification of Diseases (ICD) codes, and (3) medication
prescriptions related to MCI/ADRD. We trained a regularized logistic
regression model to predict MCI/ADRD diagnoses and evaluated its performance
using standard metrics including area under the receiver operating curve
(AUROC), area under the precision-recall curve (AUPRC), accuracy, specificity,
precision, recall, and F1 score.
**Results: **Thirty percent of patients in the cohort carried diagnoses of
MCI/ADRD based on manual review. When evaluated on a held-out test set, the
best model using clinical notes, ICDs, and medications, achieved an AUROC of
0.98, an AUPRC of 0.98, an accuracy of 0.93, a sensitivity (recall) of 0.91, a
specificity of 0.96, a precision of 0.96, and an F1 score of 0.93 The
estimated overall accuracy for patients randomly selected from EHRs was
99.88%.
**Conclusion: **Automated EHR phenotyping accurately identifies patients with
MCI/ADRD based on clinical notes, ICD codes, and medication records. This
approach holds potential for large-scale MCI/ADRD research utilizing EHR
databases.
提供机构:
BDSP
创建时间:
2025-09-26



