five

Predicting Diabetes From Tracking Medical Records

收藏
Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/nxnty5g7y6
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains 1,168 medical records designed for predicting the onset of diabetes based on routine diagnostic measurements. Each record includes eight clinical features commonly collected during standard health screenings, along with a binary outcome variable indicating whether the patient was diagnosed with diabetes. Features: Pregnancies: Number of times the patient has been pregnant Glucose: Plasma glucose concentration from a 2-hour oral glucose tolerance test (mg/dL) BloodPressure: Diastolic blood pressure (mm Hg) SkinThickness: Triceps skinfold thickness (mm) Insulin: 2-hour serum insulin level (μU/mL) BMI: Body mass index calculated as weight in kg / (height in m)² DiabetesPedigreeFunction: A composite score reflecting the likelihood of diabetes based on family history Age: Age of the patient in years Outcome: Binary target variable (1 = diabetes diagnosed, 0 = no diabetes) The dataset comprises 771 negative cases and 397 positive cases, representing a class imbalance ratio of approximately 66:34. Patient ages range from 21 to 81 years. Some feature columns contain zero values (e.g., Glucose, BloodPressure, SkinThickness, Insulin, BMI) that likely represent missing or unrecorded measurements rather than true biological zeros; researchers should account for this during preprocessing. This dataset is well suited for supervised binary classification tasks and can be used to benchmark machine learning models such as logistic regression, decision trees, random forests, gradient boosting, support vector machines, and neural networks. It is also appropriate for educational purposes in data science and healthcare analytics curricula, including exercises in exploratory data analysis, feature engineering, handling missing values, class imbalance techniques, and model evaluation. The data was prepared and exported from VertexMD, a local-first electronic health records application designed for personal medical record tracking and interoperability research.
创建时间:
2026-04-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作