five

PRediction Of Disease PHEnoTypes (PROPHET)

收藏
DataCite Commons2026-04-01 更新2026-04-25 收录
下载链接:
https://bdsp.io/content/480zbyvqxlq0agc6myih/1.0/
下载链接
链接失效反馈
官方服务:
资源简介:
Large-scale electronic health record (EHR) phenotyping is essential for epidemiology, outcomes research, and clinical-trial recruitment, yet existing resources are largely single-center, limited to binary diagnostic labels, and lack computational efficiency for deployment across millions of clinical notes. No publicly available, multicenter annotated EHR resource exists for neurological phenotyping spanning diagnoses, severity scales, and outcomes. We assembled a multicenter, de-identified EHR resource spanning six U.S. health systems (2010-2023) with expert-annotated reference standards. We developed a high-throughput phenotyping framework--Prophet (PRediction Of Disease PHEnoTypes)--combining machine learning and natural language processing (NLP) across routinely available EHR data types. Prophet is architected for scale and speed, enabling analysis of millions of clinical notes in hours at low marginal cost. The modular design supports rapid integration of additional phenotypes. We evaluated generalizability using leave-one-site-out cross-validation with nested hyperparameter optimization. The resource covers 17 neurological phenotypes across 18,282 unique patients and 34,162 annotated clinical visits, encompassing acute (e.g., traumatic brain injury, stroke) and chronic (e.g., Parkinson's disease, epilepsy) diagnoses, as well as severity and outcomes (NIH Stroke Scale, modified Rankin Scale). We release this large, multicenter, expert-annotated EHR dataset and the validated, open-source phenotyping framework to enable scalable neurological EHR research.
提供机构:
BDSP
创建时间:
2026-04-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作