Extraction of clinical phenotypes for Alzheimer disease dementia from clinical notes using natural language processing
收藏DataCite Commons2025-05-01 更新2025-05-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.0vt4b8h3g
下载链接
链接失效反馈官方服务:
资源简介:
Objectives There is much interest in utilizing clinical data for
developing prediction models for Alzheimer disease (AD) risk, progression,
and outcomes. Existing studies have mostly utilized curated research
registries, image analysis, and structured Electronic Health Record (EHR)
data. However, much critical information resides in relatively
inaccessible unstructured clinical notes within the EHR. Materials and
Methods We developed a natural language processing (NLP)-based pipeline to
extract AD-related clinical phenotypes, documenting strategies for success
and assessing the utility of mining unstructured clinical notes. We
evaluated the pipeline against gold-standard manual annotations performed
by two clinical dementia experts for AD-related clinical phenotypes
including medical comorbidities, biomarkers, neurobehavioral test scores,
behavioral indicators of cognitive decline, family history, and
neuroimaging findings. Results Documentation rates for each phenotype
varied in the structured versus unstructured EHR. Inter-annotator
agreement was high (Cohen’s kappa = 0.72–1) and positively correlated with
the NLP-based phenotype extraction pipeline’s performance (average
F1-score = 0.65-0.99) for each phenotype. Discussion We developed an
automated NLP-based pipeline to extract informative phenotypes that may
improve the performance of eventual machine-learning predictive models for
AD. In the process, we examined documentation practices for each phenotype
relevant to the care of AD patients and identified factors for success.
Conclusion Success of our NLP-based phenotype extraction pipeline depended
on domain-specific knowledge and focus on a specific clinical domain
instead of maximizing generalizability.
提供机构:
Dryad
创建时间:
2023-02-10



