idoelg/Lastone_Cardiovascular_Disease
收藏Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/idoelg/Lastone_Cardiovascular_Disease
下载链接
链接失效反馈官方服务:
资源简介:
# Cardiovascular Disease dataset
**Project by: Ido | Reichman University**
https://huggingface.co/datasets/idoelg/Lastone_Cardiovascular_Disease/resolve/main/video_ass1_EDA.mp4
<video src="https://huggingface.co/datasets/idoelg/Lastone_Cardiovascular_Disease/resolve/main/video_ass1_EDA.mp4
" controls="controls" style="max-width: 720px;"></video>
dataset name : Cardiovascular Disease dataset
sourse : kaggle
https://www.kaggle.com/datasets/sulianova/cardiovascular-disease-dataset
Research Question: *Can we effectively identify high-risk cardiovascular patients by combining clinical markers with lifestyle data, and which factors should take priority?*
## Project Overview
This project investigates a scale dataset of 70,000 patients to identify the most critical predictors of Cardiovascular Disease (CVD)
### Data Cleaning
Before analysis, I performed a deep audit of the raw data. , I identified and corrected significant physiological inconsistencies:
* I discovered a major error where the dataset (ages 29-64) included records with a minimum weight of **10 kg** and a height of **55 cm**. These measurements characterize infants, not adults, and were purged to maintain model integrity.
* I removed impossible values, such as **negative blood pressure** and extreme height records (e.g., **250 cm**), treating them as data entry errors.
* I observed that in **Weight, Cholesterol, and Blood Pressure**, the mean was consistently higher than the median. This indicated a **right-skewed distribution**, where extreme outliers were biasing the averages.
before:

after :

### Feature Engineering (Clinical Value-Add)
To elevate the analysis from raw data to clinical insights, I engineered two primary features:
* **BMI (Body Mass Index):** Calculated as $\text{BMI} = \frac{\text{weight (kg)}}{\text{height (m)}^2}$. This allows me to standardize metabolic risk across different body types.
* **Age Adjusted BP Diagnosis:** Based on official guidelines from the Israel Ministry of Health, I built a dynamic classification system. Instead of static thresholds, this feature labels blood pressure as **Normal** or **Too High** based on the patient's specific age bracket.
## The Narrative : Key Insights
### 1. question 1
**Research Question:** What is the prevalence of cardiovascular disease (CVD) within the study sample, and is there any significant statistical bias in the population distribution?
The dataset maintains a nearly perfect 50/50 split between healthy and sick patients. This balance ensures that my analysis and any future predictive models are unbiased and trained equally on both outcomes.

### 2. question
**Research Question:** Does the age-adjusted, dynamic blood pressure diagnostic framework (BP Diagnosis) serve as a reliable indicator for predicting cardiovascular disease?
Using our age-adjusted BP classification, we tested its effectiveness as a diagnostic tool.
* **Insight:** This single engineered feature successfully identifies **80% of all CVD patients**. This proves that medical thresholds are the most powerful first-line filters for risk assessment.

### 3. question
**Research Question:** To what extent does the sensitivity of blood pressure as a diagnostic tool for cardiovascular disease vary across different age groups?

This analysis evaluates diagnostic efficacy among confirmed patients, showing that while high blood pressure is a highly reliable indicator for the 50-65 age group, it often fails to identify sick individuals in younger cohorts (ages 29-50). This diagnostic gap demonstrates that relying solely on blood pressure is insufficient; integrating additional markers like BMI and cholesterol is essential to minimize missed cases and improve early detection across all age groups.
### 4. Question
**Research Question:** Which metabolic and lifestyle factors serve as the most significant predictors of cardiovascular disease in patients who present with clinically normal blood pressure?

This analysis examines patients with normal blood pressure to identify the variables that distinguish healthy individuals from confirmed patients. The data shows a significant increase in the prevalence of obesity, high cholesterol, and high glucose among the patient group compared to healthy individuals. These findings demonstrate that blood pressure alone is an insufficient diagnostic tool, and that incorporating BMI and blood profiles is necessary to identify morbidity within this specific category.
## Conclusion: Answering the Research Question
**Research Question:** Can we effectively identify high-risk cardiovascular patients by combining clinical markers with lifestyle data, and which factors should take priority?
The study confirms that high-risk cardiovascular patients can be effectively identified through a combined analytical approach. The findings establish a clear hierarchy for diagnostic priority:
1. **Primary Priority:** **Age-Adjusted Blood Pressure** serves as the most powerful first-line filter, correctly identifying **80%** of confirmed patients.
2. **Secondary Priority:** In cases where blood pressure appears normal, or among younger cohorts (**ages 29-50**), metabolic markers—specifically **BMI, Cholesterol, and Glucose**—must take priority.
**Summary:** While clinical thresholds are the strongest predictors, they are not exhaustive. A multi-layered diagnostic model that integrates age adjusted clinical markers with metabolic and lifestyle data is essential to close the diagnostic gap and ensure early detection.
提供机构:
idoelg



