five

idoelg/Lastone_Cardiovascular_Disease

收藏
Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/idoelg/Lastone_Cardiovascular_Disease
下载链接
链接失效反馈
官方服务:
资源简介:
# Cardiovascular Disease dataset **Project by: Ido | Reichman University** https://huggingface.co/datasets/idoelg/Lastone_Cardiovascular_Disease/resolve/main/video_ass1_EDA.mp4 <video src="https://huggingface.co/datasets/idoelg/Lastone_Cardiovascular_Disease/resolve/main/video_ass1_EDA.mp4 " controls="controls" style="max-width: 720px;"></video> dataset name : Cardiovascular Disease dataset sourse : kaggle https://www.kaggle.com/datasets/sulianova/cardiovascular-disease-dataset Research Question: *Can we effectively identify high-risk cardiovascular patients by combining clinical markers with lifestyle data, and which factors should take priority?* ##  Project Overview This project investigates a scale dataset of 70,000 patients to identify the most critical predictors of Cardiovascular Disease (CVD) ### Data Cleaning Before analysis, I performed a deep audit of the raw data. , I identified and corrected significant physiological inconsistencies: *  I discovered a major error where the dataset (ages 29-64) included records with a minimum weight of **10 kg** and a height of **55 cm**. These measurements characterize infants, not adults, and were purged to maintain model integrity. *  I removed impossible values, such as **negative blood pressure** and extreme height records (e.g., **250 cm**), treating them as data entry errors. *  I observed that in **Weight, Cholesterol, and Blood Pressure**, the mean was consistently higher than the median. This indicated a **right-skewed distribution**, where extreme outliers were biasing the averages. before: ![טבלת דאטה סט לפני ניקוי נתונים ](https://cdn-uploads.huggingface.co/production/uploads/69bbf73b0b92270da8940a68/guxjOgGg8UFfH-aH_7cnL.png) after : ![טבלת דאטה סט אחרי ניקוי נתונים ](https://cdn-uploads.huggingface.co/production/uploads/69bbf73b0b92270da8940a68/OYQLj9iQRyqHV4qKSwWWY.png) ###  Feature Engineering (Clinical Value-Add) To elevate the analysis from raw data to clinical insights, I engineered two primary features: * **BMI (Body Mass Index):** Calculated as $\text{BMI} = \frac{\text{weight (kg)}}{\text{height (m)}^2}$. This allows me to standardize metabolic risk across different body types. * **Age Adjusted BP Diagnosis:** Based on official guidelines from the Israel Ministry of Health, I built a dynamic classification system. Instead of static thresholds, this feature labels blood pressure as **Normal** or **Too High** based on the patient's specific age bracket. ##  The Narrative : Key Insights ### 1. question 1 **Research Question:** What is the prevalence of cardiovascular disease (CVD) within the study sample, and is there any significant statistical bias in the population distribution? The dataset maintains a nearly perfect 50/50 split between healthy and sick patients. This balance ensures that my analysis and any future predictive models are unbiased and trained equally on both outcomes. ![גרף מספר 1](https://cdn-uploads.huggingface.co/production/uploads/69bbf73b0b92270da8940a68/HfXsViBbKOPuJxP5eQQIi.png) ### 2. question **Research Question:** Does the age-adjusted, dynamic blood pressure diagnostic framework (BP Diagnosis) serve as a reliable indicator for predicting cardiovascular disease? Using our age-adjusted BP classification, we tested its effectiveness as a diagnostic tool. * **Insight:** This single engineered feature successfully identifies **80% of all CVD patients**. This proves that medical thresholds are the most powerful first-line filters for risk assessment. ![גרף מספר 2 ](https://cdn-uploads.huggingface.co/production/uploads/69bbf73b0b92270da8940a68/DW5zkV4DEpUjNyoQjn5k0.png) ### 3. question **Research Question:** To what extent does the sensitivity of blood pressure as a diagnostic tool for cardiovascular disease vary across different age groups? ![Screenshot 2026-04-06 215151](https://cdn-uploads.huggingface.co/production/uploads/69bbf73b0b92270da8940a68/miHZbNnuKmw7MO6_o7kTh.png) This analysis evaluates diagnostic efficacy among confirmed patients, showing that while high blood pressure is a highly reliable indicator for the 50-65 age group, it often fails to identify sick individuals in younger cohorts (ages 29-50). This diagnostic gap demonstrates that relying solely on blood pressure is insufficient; integrating additional markers like BMI and cholesterol is essential to minimize missed cases and improve early detection across all age groups. ### 4. Question **Research Question:** Which metabolic and lifestyle factors serve as the most significant predictors of cardiovascular disease in patients who present with clinically normal blood pressure? ![Screenshot 2026-04-06 215450](https://cdn-uploads.huggingface.co/production/uploads/69bbf73b0b92270da8940a68/JMAI8zW1hB1AQZGNG4abu.png) This analysis examines patients with normal blood pressure to identify the variables that distinguish healthy individuals from confirmed patients. The data shows a significant increase in the prevalence of obesity, high cholesterol, and high glucose among the patient group compared to healthy individuals. These findings demonstrate that blood pressure alone is an insufficient diagnostic tool, and that incorporating BMI and blood profiles is necessary to identify morbidity within this specific category. ## Conclusion: Answering the Research Question **Research Question:** Can we effectively identify high-risk cardiovascular patients by combining clinical markers with lifestyle data, and which factors should take priority? The study confirms that high-risk cardiovascular patients can be effectively identified through a combined analytical approach. The findings establish a clear hierarchy for diagnostic priority: 1. **Primary Priority:** **Age-Adjusted Blood Pressure** serves as the most powerful first-line filter, correctly identifying **80%** of confirmed patients. 2. **Secondary Priority:** In cases where blood pressure appears normal, or among younger cohorts (**ages 29-50**), metabolic markers—specifically **BMI, Cholesterol, and Glucose**—must take priority. **Summary:** While clinical thresholds are the strongest predictors, they are not exhaustive. A multi-layered diagnostic model that integrates age adjusted clinical markers with metabolic and lifestyle data is essential to close the diagnostic gap and ensure early detection.
提供机构:
idoelg
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作