Extraction of clinical phenotypes for Alzheimer disease dementia from clinical notes using natural language processing

Name: Extraction of clinical phenotypes for Alzheimer disease dementia from clinical notes using natural language processing
Creator: Dryad
Published: 2025-05-01 05:37:10
License: 暂无描述

DataCite Commons2025-05-01 更新2025-05-10 收录

下载链接：

https://datadryad.org/dataset/doi:10.5061/dryad.0vt4b8h3g

下载链接

链接失效反馈

官方服务：

资源简介：

Objectives There is much interest in utilizing clinical data for developing prediction models for Alzheimer disease (AD) risk, progression, and outcomes. Existing studies have mostly utilized curated research registries, image analysis, and structured Electronic Health Record (EHR) data. However, much critical information resides in relatively inaccessible unstructured clinical notes within the EHR. Materials and Methods We developed a natural language processing (NLP)-based pipeline to extract AD-related clinical phenotypes, documenting strategies for success and assessing the utility of mining unstructured clinical notes. We evaluated the pipeline against gold-standard manual annotations performed by two clinical dementia experts for AD-related clinical phenotypes including medical comorbidities, biomarkers, neurobehavioral test scores, behavioral indicators of cognitive decline, family history, and neuroimaging findings. Results Documentation rates for each phenotype varied in the structured versus unstructured EHR. Inter-annotator agreement was high (Cohen’s kappa = 0.72–1) and positively correlated with the NLP-based phenotype extraction pipeline’s performance (average F1-score = 0.65-0.99) for each phenotype. Discussion We developed an automated NLP-based pipeline to extract informative phenotypes that may improve the performance of eventual machine-learning predictive models for AD. In the process, we examined documentation practices for each phenotype relevant to the care of AD patients and identified factors for success. Conclusion Success of our NLP-based phenotype extraction pipeline depended on domain-specific knowledge and focus on a specific clinical domain instead of maximizing generalizability.

提供机构：

Dryad

创建时间：

2023-02-10

5,000+

优质数据集

54 个

任务类型

进入经典数据集