Automated Prediction of Glasgow Coma Scale Scores from Unstructured Electronic Health Records: a Natural Language Processing Approach
收藏DataCite Commons2026-04-17 更新2026-04-25 收录
下载链接:
https://bdsp.io/content/gd2l2o4nrl4dl9sok4bd/1.0.0/
下载链接
链接失效反馈官方服务:
资源简介:
**Background:** Multicenter electronic health records (EHR) can support
quality improvement and comparative effectiveness research in critical care.
However, limitations of EHR-based research include challenges in abstracting
key clinical variables, including a patient 's level of consciousness.
**Objective:** The objective of our study was to develop a natural language
processing (NLP) model to predict the Glasgow Coma Scale (GCS) scores from
daily EHR notes.
**Methods:** The study included adult patients (≥18 years) admitted to Mass
General Brigham (MGB) hospitals (2017-2024) and patients from the MIMIC-III
database (Medical Information Mart for Intensive Care-MIMIC III 2001-2012)
v1.4. A dataset of all patients from both institutions was split into
train/hold-out test (70%/30%) sets. Variables consisted of daily notes, age,
sex and admission type. We trained a pooled ordinal regression model
(ordinalNet) with an elastic net penalty to predict the lowest daily level of
consciousness across three classes: severe (GCS 3-8), moderate (GCS 9-12) and
mild (GCS 13-15), and a pooled linear model to predict continuous GCS scores
(3-15). Gold standard GCS was obtained from structured flowsheet data.
External generalizability was assessed using a single-institution ordinal
model trained on MGB and tested on MIMIC. Following post-hoc calibration,
ordinal and linear model performance was evaluated on the hold-out test sets
using the areas under the receiver characteristic curve (AUROC) and precision-
recall curve (AUPRC); and root mean square error (RMSE) and Pearson
correlation, respectively.
**Results:** Our modeling cohort included 145,897 patients (MGB = 123,257;
MIMIC = 22,640) with 1,446,965 days of hospitalization, between training and
testing sets; average age 62 [SD 18] years and balanced sex distribution. The
pooled ordinalNet achieved AUROC and AUPRC [95% CI] of 0.96 [0.96-0.96] and
0.77 [0.76-0.77]. The single-institution ordinal model achieved AUROC 0.90
[0.89-0.90] and AUPRC 0.80 [0.79-0.80]. The pooled linear model achieved RMSE
2.30 [2.30-2.30] and correlation 0.76 [0.76-0.76]. Predictions for severe GCS
were driven by terms indicating unresponsiveness and critical interventions,
moderate GCS by intermediate alertness descriptors, and mild GCS by mentions
of normal or awake behavior.
**Conclusions:** Pooled ordinal and linear models can accurately predict GCS
from unstructured data and can support large-scale phenotyping of neurological
assessments for future critical care research.
提供机构:
BDSP
创建时间:
2026-04-17



