Narcolepsy Risk Estimation from Clinical Notes

Name: Narcolepsy Risk Estimation from Clinical Notes
Creator: BDSP
Published: 2026-03-03 04:09:57
License: 暂无描述

DataCite Commons2026-03-03 更新2026-03-29 收录

下载链接：

https://bdsp.io/content/qsyoj1ut1t90zzl9ajpr/1.0/

下载链接

链接失效反馈

官方服务：

资源简介：

Narcolepsy is a chronic neurological disorder that is often underdiagnosed and subject to long diagnostic delays. We developed and validated machine learning models to phenotype narcolepsy type 1 (NT1) and narcolepsy type 2/idiopathic hypersomnia (NT2/IH) using electronic health record (EHR) data from five sites within the Brain Data Science Platform (BDSP): Mass General Brigham, Beth Israel Deaconess Medical Center, Boston Children's Hospital, Stanford University, and Emory University. Clinical notes were manually annotated by physician reviewers following a standardized protocol, and model features were derived from ICD codes, medication orders, and natural language keyword extraction. For cross-sectional classification, we trained logistic regression, random forest, gradient boosting, and XGBoost models using nested leave-one-site-out cross-validation. NT1 classification achieved mean AUROCs of 0.991-0.994 and AUPRCs of 0.906-0.935; NT2/IH classification was more challenging, with mean AUROCs of 0.967-0.984 and AUPRCs of 0.692-0.778. For longitudinal prediction, we trained regularized logistic regression models (SGD with L1 penalty) using cumulative NLP features from pre-diagnostic notes, with a 6-month horizon exclusion to prevent learning from diagnostic-workup features. Leave-one-site-out validation achieved AUROCs of 0.80 (any narcolepsy) and 0.79 (NT1), enabling identification of at-risk patients prior to clinical diagnosis. We here release the associated data and code to support reproducible research in narcolepsy phenotyping from large-scale EHR data.

提供机构：

BDSP

创建时间：

2026-03-03

5,000+

优质数据集

54 个

任务类型

进入经典数据集