Stroke Smart Healthcare Prediction
收藏DataCite Commons2026-04-22 更新2026-05-04 收录
下载链接:
https://data.mendeley.com/datasets/g9vp7hgj7d/2
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is an integrated healthcare dataset derived from a multi-source Data Warehouse implemented using a Medallion architecture. The Gold layer represents the final analytical output and contains 143,960 patient records generated through the integration of three publicly available Kaggle datasets covering stroke, heart disease, and cardiovascular conditions. The dataset is designed to provide a unified and enriched representation of patient-level clinical and demographic information, enabling comprehensive healthcare analytics and research .
The dataset is structured as a tabular dataset organized in a relational format suitable for analytical processing. It consolidates heterogeneous healthcare records into a single unified schema, ensuring consistent feature representation across all entries. Each record corresponds to an individual patient and captures multiple dimensions of health-related information, including demographic characteristics, clinical measurements, and lifestyle indicators. The integration process ensures that all data is harmonized and standardized, facilitating reliable comparative analysis and modeling.
The final dataset consists of 25 features encompassing several categories. Demographic attributes include age, gender, marital status, work type, and residence type. Clinical measurements include glucose levels, body mass index (BMI), and blood pressure indicators. Cardiovascular-related features are aggregated to provide composite indicators such as average height, weight, systolic and diastolic pressure, cholesterol, and glucose levels. Additional heart-related metrics include resting blood pressure, maximum heart rate, and other derived indicators. A target variable is included to represent stroke occurrence. All features are encoded in numerical or categorical formats to support analytical workflows.
The dataset was developed to support advanced healthcare analytics by integrating multiple data sources into a cohesive structure that reflects the complex relationships among cardiovascular and cerebrovascular risk factors. By combining diverse datasets into a unified analytical layer, it enables consistent analysis and supports the development of predictive and decision-support applications in healthcare contexts.
The dataset is stored in tabular format and can be exported as CSV files, ensuring compatibility with common data analysis tools. All underlying data sources are publicly available and fully anonymized, containing no personally identifiable information. The dataset is intended strictly for research purposes and adheres to ethical standards for data usage in healthcare analytics.
提供机构:
Mendeley Data
创建时间:
2026-04-22



