five

Automated Prediction of Glasgow Coma Scale Scores from Unstructured Electronic Health Records: a Natural Language Processing Approach

收藏
DataCite Commons2026-04-17 更新2026-04-25 收录
下载链接:
https://bdsp.io/content/gd2l2o4nrl4dl9sok4bd/1.0.0/
下载链接
链接失效反馈
官方服务:
资源简介:
**Background:** Multicenter electronic health records (EHR) can support quality improvement and comparative effectiveness research in critical care. However, limitations of EHR-based research include challenges in abstracting key clinical variables, including a patient 's level of consciousness. **Objective:** The objective of our study was to develop a natural language processing (NLP) model to predict the Glasgow Coma Scale (GCS) scores from daily EHR notes. **Methods:** The study included adult patients (≥18 years) admitted to Mass General Brigham (MGB) hospitals (2017-2024) and patients from the MIMIC-III database (Medical Information Mart for Intensive Care-MIMIC III 2001-2012) v1.4. A dataset of all patients from both institutions was split into train/hold-out test (70%/30%) sets. Variables consisted of daily notes, age, sex and admission type. We trained a pooled ordinal regression model (ordinalNet) with an elastic net penalty to predict the lowest daily level of consciousness across three classes: severe (GCS 3-8), moderate (GCS 9-12) and mild (GCS 13-15), and a pooled linear model to predict continuous GCS scores (3-15). Gold standard GCS was obtained from structured flowsheet data. External generalizability was assessed using a single-institution ordinal model trained on MGB and tested on MIMIC. Following post-hoc calibration, ordinal and linear model performance was evaluated on the hold-out test sets using the areas under the receiver characteristic curve (AUROC) and precision- recall curve (AUPRC); and root mean square error (RMSE) and Pearson correlation, respectively. **Results:** Our modeling cohort included 145,897 patients (MGB = 123,257; MIMIC = 22,640) with 1,446,965 days of hospitalization, between training and testing sets; average age 62 [SD 18] years and balanced sex distribution. The pooled ordinalNet achieved AUROC and AUPRC [95% CI] of 0.96 [0.96-0.96] and 0.77 [0.76-0.77]. The single-institution ordinal model achieved AUROC 0.90 [0.89-0.90] and AUPRC 0.80 [0.79-0.80]. The pooled linear model achieved RMSE 2.30 [2.30-2.30] and correlation 0.76 [0.76-0.76]. Predictions for severe GCS were driven by terms indicating unresponsiveness and critical interventions, moderate GCS by intermediate alertness descriptors, and mild GCS by mentions of normal or awake behavior. **Conclusions:** Pooled ordinal and linear models can accurately predict GCS from unstructured data and can support large-scale phenotyping of neurological assessments for future critical care research.
提供机构:
BDSP
创建时间:
2026-04-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作