Automated Prediction of Glasgow Coma Scale Scores from Unstructured Electronic Health Records: a Natural Language Processing Approach

Name: Automated Prediction of Glasgow Coma Scale Scores from Unstructured Electronic Health Records: a Natural Language Processing Approach
Creator: BDSP
Published: 2026-04-17 14:15:17
License: 暂无描述

DataCite Commons2026-04-17 更新2026-04-25 收录

下载链接：

https://bdsp.io/content/gd2l2o4nrl4dl9sok4bd/1.0.0/

下载链接

链接失效反馈

官方服务：

资源简介：

**Background:** Multicenter electronic health records (EHR) can support quality improvement and comparative effectiveness research in critical care. However, limitations of EHR-based research include challenges in abstracting key clinical variables, including a patient 's level of consciousness. **Objective:** The objective of our study was to develop a natural language processing (NLP) model to predict the Glasgow Coma Scale (GCS) scores from daily EHR notes. **Methods:** The study included adult patients (≥18 years) admitted to Mass General Brigham (MGB) hospitals (2017-2024) and patients from the MIMIC-III database (Medical Information Mart for Intensive Care-MIMIC III 2001-2012) v1.4. A dataset of all patients from both institutions was split into train/hold-out test (70%/30%) sets. Variables consisted of daily notes, age, sex and admission type. We trained a pooled ordinal regression model (ordinalNet) with an elastic net penalty to predict the lowest daily level of consciousness across three classes: severe (GCS 3-8), moderate (GCS 9-12) and mild (GCS 13-15), and a pooled linear model to predict continuous GCS scores (3-15). Gold standard GCS was obtained from structured flowsheet data. External generalizability was assessed using a single-institution ordinal model trained on MGB and tested on MIMIC. Following post-hoc calibration, ordinal and linear model performance was evaluated on the hold-out test sets using the areas under the receiver characteristic curve (AUROC) and precision- recall curve (AUPRC); and root mean square error (RMSE) and Pearson correlation, respectively. **Results:** Our modeling cohort included 145,897 patients (MGB = 123,257; MIMIC = 22,640) with 1,446,965 days of hospitalization, between training and testing sets; average age 62 [SD 18] years and balanced sex distribution. The pooled ordinalNet achieved AUROC and AUPRC [95% CI] of 0.96 [0.96-0.96] and 0.77 [0.76-0.77]. The single-institution ordinal model achieved AUROC 0.90 [0.89-0.90] and AUPRC 0.80 [0.79-0.80]. The pooled linear model achieved RMSE 2.30 [2.30-2.30] and correlation 0.76 [0.76-0.76]. Predictions for severe GCS were driven by terms indicating unresponsiveness and critical interventions, moderate GCS by intermediate alertness descriptors, and mild GCS by mentions of normal or awake behavior. **Conclusions:** Pooled ordinal and linear models can accurately predict GCS from unstructured data and can support large-scale phenotyping of neurological assessments for future critical care research.

提供机构：

BDSP

创建时间：

2026-04-17

5,000+

优质数据集

54 个

任务类型

进入经典数据集