Synthea lung cancer synthetic patient data series for ML
收藏DataONE2022-11-13 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:f9896efa0b16b047299df517456e5bd1bab1dbb7e2821c40417d8fa097fde517
下载链接
链接失效反馈官方服务:
资源简介:
These synthetic patient datasets were created for machine learning (ML) study of lung cancer risk prediction in simulation of ML-enabled learning health systems. Five populations of 30K patients were generated by the Synthea patient generator. They were combined sequentially to form 5 different size populations, from 30K to 150K patients. Patients with or without lung cancer were selected roughly at 1:3 ratio and their electronic health records (EHR) were processed to data table files ready for machine learning. The ML-ready table files also have the continuous numeric values converted to categorical values. Because Synthea patients are closely resemble to real patients, these ML-ready dataset can be used to develop and test ML algorithms, and train researchers. Unlike real patient data, these Synthea datasets can be shared with collaborators anywhere without privacy concerns. The first use of these datasets was in a LHS simulation study, which was published in Nature Scientific Reports (see https://www.nature.com/articles/s41598-022-23011-4).
创建时间:
2023-11-08



