Synthea stroke synthetic patient data series for risk prediction ML
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://doi.org/10.7910/DVN/LBD9GU
下载链接
链接失效反馈官方服务:
资源简介:
These synthetic patient datasets were created for machine learning (ML) study of stroke risk prediction. Five populations of 30K patients were generated by the Synthea patient generator. They were combined sequentially to form 5 different size populations, from 30K to 150K patients. Patients with or without stroke were selected roughly at 1:3 ratio and their electronic health records (EHR) were processed to data table files ready for machine learning. The ML-ready table files also have the continuous numeric values converted to categorical values. Because Synthea patients are closely resemble to real patients, these ML-ready dataset can be used to develop and test ML algorithms, and train researchers. Unlike real patient data, these Synthea datasets can be shared with collaborators anywhere without privacy concerns. The first use of these datasets was in a LHS simulation study, which was published in Nature Scientific Reports (see https://www.nature.com/articles/s41598-022-23011-4).
创建时间:
2022-11-14



