A synthetic dataset for the exploration of survival and classification models: prediction of heart attack or stroke within a 10-year follow-up period
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12567415
下载链接
链接失效反馈官方服务:
资源简介:
Machine learning methodologies are increasingly popular in health care research. This shift to integrated data science approaches necessitates professional development of the existing health care data analyst workforce. To enhance a smooth transition, educational resources need to be developed. Barriers to accessing real healthcare datasets, vital for health care data analyses methodologies training purposes, include financial, ethical and patient confidentiality concerns. Synthetic datasets mimicking real-world complexities offer a simpler solution.
We present a synthetic dataset which mirrors routinely collected primary care data on heart attack and stroke among the adult population. The data incorporates much of the practical challenges encountered in routinely collected primary care systems such as missing data, informative censoring, interactions, variable irrelevance, and noise and can be used for training in methods which handle these difficulties. The intent is for the user to build models of heart/stroke risk using survival-based methodologies.
By sharing this synthetic dataset openly, our goal is to contribute a transformative asset for professional training in health and social care data analysis. The dataset covers demographics, lifestyle variables, comorbidities, systolic blood pressure, hypertension treatment, family history of cardiovascular diseases, respiratory functioning, and experience of heart-attack and/or stroke. This initiative aims to bridge the gap in sophisticated healthcare datasets for training, fostering professional development of the health and social care research workforce.
This study is funded by the National Institute for Health and Care Research ARC Wessex and the National Centre for Research Methods. The views expressed in this summary are those of the author(s) and not necessarily those of the National Institute for Health and Care Research or the Department of Health and Social Care.
创建时间:
2024-06-27



