Extraction of clinical phenotypes for Alzheimer disease dementia from clinical notes using natural language processing

DataONE2023-02-10 更新2025-07-19 收录

下载链接：

https://search.dataone.org/view/sha256:c04ff069338d548c1e698e30d95cd417a58926704189b1ff3f9757e162bf8d70

下载链接

链接失效反馈

官方服务：

资源简介：

Objectives There is much interest in utilizing clinical data for developing prediction models for Alzheimer disease (AD) risk, progression, and outcomes. Existing studies have mostly utilized curated research registries, image analysis, and structured Electronic Health Record (EHR) data. However, much critical information resides in relatively inaccessible unstructured clinical notes within the EHR. Materials and Methods We developed a natural language processing (NLP)-based pipeline to extract AD-related clinical phenotypes, documenting strategies for success and assessing the utility of mining unstructured clinical notes. We evaluated the pipeline against gold-standard manual annotations performed by two clinical dementia experts for AD-related clinical phenotypes including medical comorbidities, biomarkers, neurobehavioral test scores, behavioral indicators of cognitive decline, family history, and neuroimaging findings. Results Documentation rates for each phenotype varied in the st..., We developed a natural language processing (NLP)-based pipeline which contains independent NLP modules that target the extraction of ten clinical phenotypes relevant to Alzheimer disease dementia progression. The pipeline was trained on unstructured clinical notes originating from Allscripts TouchWorks associated with AD dementia patient office vsits that occurred between June 1, 2013, to May 31, 2018, extracted from the Washington University in St. Louis Research Data Core (RDC), a repository of patient clinical data from BJC HealthCare and Washington University Physicians. The targeted phenotypes included neurobehavioral test scores (Clinical Dementia Rating and Mini-Mental State Exam) and their corresponding test dates, comorbidities (hypertension and depression), neuroimaging findings (presence of atrophy or infarct), behavioral indicators of dementia (repeating and misplacing), biomarker levels (total and phosphorylated tau protein levels), and family history (whether there was a f..., Data preprocessing steps were performed using the Python Pandas and striprtf (version 0.0.10) packages. Linguamatics I2E query files (*.i2qy) and Enterprise Architect Simulation Library (EASL) code for each NLP module can be found on the Linguamatics Community webpage (https://community.linguamatics.com/queries), accessible with the creation of a free account. Linguamatics I2E software is required to open the query files (*.i2qy) directly, but the logic underlying the NLP modules can be understood by referencing the EASL code.

创建时间：

2025-07-15

5,000+

优质数据集

54 个

任务类型

进入经典数据集