Diabetes Diagnosis Dataset for Predictive Modeling and Machine Learning
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/t8hnnyr5ph
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains clinical and demographic information of patients collected for the purpose of diabetes prediction and analysis using machine learning techniques. The dataset is structured in CSV format and includes several key medical attributes that are widely used in diabetes diagnosis and research studies.
The primary objective of this dataset is to facilitate binary classification of diabetes status and support research in medical data analysis, predictive modeling, feature selection, and explainable artificial intelligence (XAI). It can be effectively utilized for developing, training, validating, and benchmarking machine learning and deep learning models for early-stage diabetes detection.
Dataset Structure
Each row in the dataset represents a single patient record, while each column corresponds to a clinical or physiological feature. The dataset includes the following attributes:
1. Pregnancies – Number of times pregnant
2. Glucose – Plasma glucose concentration
3. BloodPressure – Diastolic blood pressure (mm Hg)
4. SkinThickness – Triceps skin fold thickness (mm)
5. Insulin – 2-Hour serum insulin (mu U/ml)
6. BMI – Body Mass Index (weight in kg / height in m²)
7. DiabetesPedigreeFunction – A function indicating hereditary influence
8. Age – Age of the patient (years)
9. Outcome – Diabetes status (0 = Non-diabetic, 1 = Diabetic)
Key Features
1. Clean tabular structure suitable for supervised learning tasks
2. Balanced numerical attributes for classification modeling
3. Supports feature selection, optimization, and explainable AI analysis
4. Applicable for academic research, teaching, and benchmarking
5. Potential Applications
6. Diabetes risk prediction
7. Medical decision support systems
8. Machine learning classification experiments
9. Feature importance analysis
10. Explainable AI (XAI) studies using SHAP, LIME, Grad-CAM (for hybrid models)
File Format
CSV (.csv)
Usage Rights
This dataset is shared for research and educational purposes only.
创建时间:
2026-02-15



