five

Multi-Class Chronic Disease Data Warehouse (healthcare)

收藏
DataCite Commons2026-04-20 更新2026-05-04 收录
下载链接:
https://data.mendeley.com/datasets/6vnkkf5hv3/1
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset represents an integrated medical data warehouse developed to support multi-class chronic disease prediction. It combines three publicly available healthcare datasets—diabetes, heart disease, and hypertension—sourced from Kaggle and unified using a Medallion Architecture (Bronze, Silver, Gold) implemented in Microsoft SQL Server. The final Gold-layer dataset contains 280,985 patient records and 38 features, with no missing values. Each record corresponds to a patient and includes both a binary classification (Normal/Abnormal) and an 8-class sublabel representing disease combinations, enabling advanced co-morbidity analysis and predictive modeling . The dataset is structured as a denormalized flat table derived from a star schema and captures comprehensive patient profiles across five domains: demographic attributes (e.g., age, gender), anthropometric measures (e.g., BMI), lifestyle indicators (e.g., smoking, physical activity, stress), clinical measurements (e.g., glucose, HbA1c, cholesterol, blood pressure), and disease indicators. Features include both categorical and continuous variables, such as normalized age, lipid profiles, inflammatory markers (CRP), and cardiovascular metrics. Disease representation is encoded through binary flags and a composite categorical sublabel capturing all possible combinations of diabetes (DI), heart disease (HT), and hypertension (HY). The dataset was designed to address limitations in single-disease modeling by enabling simultaneous prediction of multiple chronic conditions. It supports the study of shared risk factors and cross-disease interactions, providing a unified feature space for machine learning applications. This facilitates the development of clinical decision-support systems capable of early detection, risk stratification, and holistic patient assessment. Provided as a UTF-8 encoded CSV file, the dataset is compatible with major analytical platforms such as Python, R, and SQL tools. Ethical considerations are addressed through full anonymization of all source data, with no personally identifiable information included. Potential applications include multi-class classification, co-morbidity analysis, feature importance studies, and benchmarking of machine learning models. It also serves as an educational resource for data warehousing and healthcare analytics. Keywords associated with the dataset include chronic disease classification, Medallion Architecture, clinical decision support, and machine learning-based healthcare analytics.
提供机构:
Mendeley Data
创建时间:
2026-04-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作