心血管疾病预测数据集

Name: 心血管疾病预测数据集
Creator: 阿里云天池
Published: 2026-06-09 23:53:00
License: 暂无描述

阿里云天池2026-06-09 更新2025-10-18 收录

下载链接：

https://tianchi.aliyun.com/dataset/212054

下载链接

链接失效反馈

官方服务：

资源简介：

数据集概述该数据集包含70,000条患者医疗记录，专门用于心血管疾病的风险预测研究。数据集来源于真实的医疗检查数据，涵盖了与心血管健康密切相关的临床指标和生活方式因素。关键特征变量数据集包含以下12个关键特征：人口统计学特征： age - 患者年龄（岁） gender - 性别（1: 女性, 2: 男性） height - 身高（cm） weight - 体重（kg）临床测量指标： ap_hi - 收缩压（mmHg） ap_lo - 舒张压（mmHg） cholesterol - 胆固醇水平（1: 正常, 2: 偏高, 3: 很高） gluc - 血糖水平（1: 正常, 2: 偏高, 3: 很高）生活方式因素： smoke - 吸烟习惯（0: 不吸烟, 1: 吸烟） alco - 饮酒习惯（0: 不饮酒, 1: 饮酒） active - 身体活动水平（0: 不活跃, 1: 活跃）目标变量 cardio - 心血管疾病诊断（0: 无疾病, 1: 有疾病）数据集特点数据规模：约70,000条记录，适合机器学习建模类别平衡：正负样本比例接近1:1，避免了类别不平衡问题特征多样性：包含数值型、类别型和二元特征现实意义：所有特征都具有明确的临床意义和医学解释性数据质量存在少量缺失值（<2%），适合进行数据填充处理部分连续变量（如血压）包含生理学上可能的异常值特征间存在一定的相关性，如收缩压与舒张压的高度相关应用价值该数据集非常适合用于：二分类预测模型的开发与比较特征重要性分析和可解释性AI研究医疗风险预测模型的构建机器学习在医疗健康领域的应用案例挑战性任务预测个体患心血管疾病的风险概率识别最重要的风险因素构建高精度且可解释的预测模型处理医疗数据中常见的异常值和缺失值这个数据集因其规模适中、特征丰富且具有明确的现实意义，成为了机器学习竞赛和学术研究中常用的基准数据集之一。

Dataset Overview This dataset contains 70,000 patient medical records, specifically designed for cardiovascular disease risk prediction research. The dataset is sourced from real medical examination data, covering clinical indicators and lifestyle factors closely related to cardiovascular health. Key Feature Variables The dataset includes the following 12 key features: Demographic Features: age - Patient age (years) gender - Gender (1: Female, 2: Male) height - Height (cm) weight - Weight (kg) Clinical Measurement Indicators: ap_hi - Systolic blood pressure (mmHg) ap_lo - Diastolic blood pressure (mmHg) cholesterol - Cholesterol level (1: Normal, 2: Elevated, 3: High) gluc - Blood glucose level (1: Normal, 2: Elevated, 3: High) Lifestyle Factors: smoke - Smoking habit (0: Non-smoker, 1: Smoker) alco - Alcohol consumption habit (0: Non-drinker, 1: Drinker) active - Physical activity level (0: Inactive, 1: Active) Target Variable cardio - Cardiovascular disease diagnosis (0: No disease, 1: With disease) Dataset Characteristics Data Scale: Approximately 70,000 records, suitable for machine learning modeling Class Balance: Near 1:1 ratio of positive and negative samples, avoiding class imbalance issues Feature Diversity: Contains numerical, categorical, and binary features Practical Significance: All features have clear clinical significance and medical interpretability Data Quality Contains a small number of missing values (<2%), suitable for data imputation Some continuous variables (e.g., blood pressure) contain physiologically plausible outliers Certain correlations exist between features, such as the high correlation between systolic and diastolic blood pressure Application Value This dataset is highly suitable for: - Development and comparison of binary classification prediction models - Feature importance analysis and interpretable AI research - Construction of medical risk prediction models - Case studies of machine learning applications in healthcare Challenging Tasks - Predicting the risk probability of an individual developing cardiovascular disease - Identifying the most important risk factors - Developing high-precision and interpretable prediction models - Handling common outliers and missing values in medical data This dataset has become one of the commonly used benchmark datasets in machine learning competitions and academic research due to its moderate scale, rich features, and clear practical significance.

提供机构：

阿里云天池

创建时间：

2025-10-14

搜集汇总

数据集介绍