PRediction Of Disease PHEnoTypes (PROPHET)
收藏DataCite Commons2026-04-01 更新2026-04-25 收录
下载链接:
https://bdsp.io/content/480zbyvqxlq0agc6myih/1.0/
下载链接
链接失效反馈官方服务:
资源简介:
Large-scale electronic health record (EHR) phenotyping is essential for
epidemiology, outcomes research, and clinical-trial recruitment, yet existing
resources are largely single-center, limited to binary diagnostic labels, and
lack computational efficiency for deployment across millions of clinical
notes. No publicly available, multicenter annotated EHR resource exists for
neurological phenotyping spanning diagnoses, severity scales, and outcomes.
We assembled a multicenter, de-identified EHR resource spanning six U.S.
health systems (2010-2023) with expert-annotated reference standards. We
developed a high-throughput phenotyping framework--Prophet (PRediction Of
Disease PHEnoTypes)--combining machine learning and natural language
processing (NLP) across routinely available EHR data types. Prophet is
architected for scale and speed, enabling analysis of millions of clinical
notes in hours at low marginal cost. The modular design supports rapid
integration of additional phenotypes. We evaluated generalizability using
leave-one-site-out cross-validation with nested hyperparameter optimization.
The resource covers 17 neurological phenotypes across 18,282 unique patients
and 34,162 annotated clinical visits, encompassing acute (e.g., traumatic
brain injury, stroke) and chronic (e.g., Parkinson's disease, epilepsy)
diagnoses, as well as severity and outcomes (NIH Stroke Scale, modified Rankin
Scale). We release this large, multicenter, expert-annotated EHR dataset and
the validated, open-source phenotyping framework to enable scalable
neurological EHR research.
提供机构:
BDSP
创建时间:
2026-04-01



