five

糖尿病风险预测数据集

收藏
阿里云天池2026-06-09 更新2025-12-13 收录
下载链接:
https://tianchi.aliyun.com/dataset/216181
下载链接
链接失效反馈
官方服务:
资源简介:
糖尿病风险预测数据集,来源于经典的皮马印第安人糖尿病数据集,该数据集是机器学习领域用于糖尿病风险预测的入门级标杆数据。 1. 适用场景 入门教学:数据规模小(768 条)、结构清晰,适合新手练习逻辑回归、决策树、随机森林等基础分类模型,快速验证算法效果。 基线模型构建:作为糖尿病风险预测的 “基准数据集”,可用于对比新算法(如深度学习)的性能提升幅度。 特征重要性分析:可直观分析 “血糖、BMI、年龄” 等因素对糖尿病风险的影响权重,辅助医疗常识验证。 2. 局限性(需注意) 人群局限性:仅包含 21 岁以上皮马印第安女性,无法推广到其他种族、性别或年龄段(如男性、青少年糖尿病风险模型不适用)。 数据时效性:数据采集于 20 世纪 80 年代,未包含现代生活方式因素(如高糖饮食、久坐时间),与当前人群的风险特征可能存在差异。 指标局限性:缺乏糖化血红蛋白(HbA1c)、血脂等关键指标,无法全面评估长期血糖控制和代谢风险。

Diabetes Risk Prediction Dataset, derived from the classic Pima Indians Diabetes Dataset, is an introductory benchmark dataset for diabetes risk prediction in the field of machine learning. 1. Application Scenarios - Introductory teaching: With a small scale (768 entries) and clear structure, it is suitable for beginners to practice basic classification models such as logistic regression, decision trees, and random forests, and quickly verify the performance of algorithms. - Baseline model construction: As a "benchmark dataset" for diabetes risk prediction, it can be used to compare the performance improvement of new algorithms such as deep learning. - Feature importance analysis: It can intuitively analyze the weight of factors such as "blood glucose, BMI, age" on diabetes risk, assisting in the verification of medical common sense. 2. Limitations (Notes) - Population limitation: It only includes Pima Indian women aged 21 and above, and cannot be generalized to other races, genders or age groups (e.g., it is not applicable to diabetes risk models for males or adolescents). - Data timeliness: The data was collected in the 1980s and does not include modern lifestyle factors such as high-sugar diet and sedentary time, which may differ from the risk characteristics of current populations. - Indicator limitation: It lacks key indicators such as glycated hemoglobin (HbA1c) and blood lipids, and cannot comprehensively evaluate long-term blood glucose control and metabolic risks.
提供机构:
阿里云天池
创建时间:
2025-12-08
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
糖尿病风险预测数据集是一个经典的机器学习入门数据集,包含768条记录,适用于基础分类模型练习和糖尿病风险预测研究。但数据集存在人群、时效性和指标局限性,仅适用于特定人群和研究场景。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务