five

Narcolepsy Risk Estimation from Clinical Notes

收藏
DataCite Commons2026-03-03 更新2026-03-29 收录
下载链接:
https://bdsp.io/content/qsyoj1ut1t90zzl9ajpr/1.0/
下载链接
链接失效反馈
官方服务:
资源简介:
Narcolepsy is a chronic neurological disorder that is often underdiagnosed and subject to long diagnostic delays. We developed and validated machine learning models to phenotype narcolepsy type 1 (NT1) and narcolepsy type 2/idiopathic hypersomnia (NT2/IH) using electronic health record (EHR) data from five sites within the Brain Data Science Platform (BDSP): Mass General Brigham, Beth Israel Deaconess Medical Center, Boston Children's Hospital, Stanford University, and Emory University. Clinical notes were manually annotated by physician reviewers following a standardized protocol, and model features were derived from ICD codes, medication orders, and natural language keyword extraction. For cross-sectional classification, we trained logistic regression, random forest, gradient boosting, and XGBoost models using nested leave-one-site-out cross-validation. NT1 classification achieved mean AUROCs of 0.991-0.994 and AUPRCs of 0.906-0.935; NT2/IH classification was more challenging, with mean AUROCs of 0.967-0.984 and AUPRCs of 0.692-0.778. For longitudinal prediction, we trained regularized logistic regression models (SGD with L1 penalty) using cumulative NLP features from pre-diagnostic notes, with a 6-month horizon exclusion to prevent learning from diagnostic-workup features. Leave-one-site-out validation achieved AUROCs of 0.80 (any narcolepsy) and 0.79 (NT1), enabling identification of at-risk patients prior to clinical diagnosis. We here release the associated data and code to support reproducible research in narcolepsy phenotyping from large-scale EHR data.
提供机构:
BDSP
创建时间:
2026-03-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作