Automated extraction of stroke severity from unstructured electronic health records using natural language processing
收藏DataCite Commons2025-10-02 更新2026-02-08 收录
下载链接:
https://bdsp.io/content/hz17q634s72iwsky8wri/
下载链接
链接失效反馈官方服务:
资源简介:
### Background
Multicenter electronic health records can support quality improvement and
comparative effectiveness research in stroke. However, limitations of
electronic health record-based research include challenges in abstracting key
clinical variables, including stroke severity, along with missing data. We
developed a natural language processing model that reads electronic health
record notes to directly extract the National Institutes of Health Stroke
Scale score when documented and predict the score from clinical documentation
when missing.
### Methods and Results
The study included notes from patients with acute stroke (aged ≥18 years)
admitted to Massachusetts General Hospital (2015-2022). The Massachusetts
General Hospital data were divided into training/holdout test (70%/30%) sets.
We developed a 2‐stage model to predict the admission National Institutes of
Health Stroke Scale, obtained from the GWTG (Get With The Guidelines) stroke
registry. We trained a model with the least absolute shrinkage and selection
operator. For test notes with documented National Institutes of Health Stroke
Scale, scores were extracted using regular expressions (stage 1); when not
documented, least absolute shrinkage and selection operator was used for
prediction (stage 2). The 2‐stage model was tested on the holdout test set and
validated in the Medical Information Mart for Intensive Care (2001-2012)
version 1.4, using root mean squared error and Spearman correlation. We
included 4163 patients (Massachusetts General Hospital, 3876; Medical
Information Mart for Intensive Care, 287); average age, 69 (SD, 15) years; 53%
men, and 72% White individuals. The model achieved a root mean squared error
of 2.89 (95% CI, 2.62-3.19) and Spearman correlation of 0.92 (95% CI,
0.91-0.93) in the Massachusetts General Hospital test set, and 2.20 (95% CI,
1.69-2.66) and 0.96 (95% CI, 0.94-0.97) in the MIMIC validation set,
respectively.
### Conclusions
The automatic natural language processing-based model can enable large‐scale
stroke severity phenotyping from the electronic health record and support
real‐world quality improvement and comparative effectiveness studies in
stroke.
提供机构:
BDSP
创建时间:
2025-10-02



