Erbil Heart Disease Dataset
收藏www.kaggle.com2022-07-26 更新2025-01-09 收录
下载链接:
https://www.kaggle.com/hangawqadir/erbil-heart-disease-dataset
下载链接
链接失效反馈官方服务:
资源简介:
This dataset has been prepared from patients' information directly and manually in a hospital. All of this data has been collected at Medical Help Center, a private hospital and heart center located in Erbil, Iraq. The main aim of building this dataset is to work on native patients’ information about heart problems to predict whether a patient suffers from heart disease. Firstly, the data has been written on paper sheets manually by resident doctors and cardiologists for those patients who visit the doctor. Later, all the data are saved to the excel sheet by the researcher very carefully and correctly. Any missing information row has been taken out so that no null values are present in the dataset. Around 400 patients’ data information with 24 features has been collected. But after filling out missing values, the data of only 333 patients were stored in the dataset. The collected data is divided into five categories. Some are the patient’s demographic, some are patients’ history, some are physical examinations and symptomatic, some are medical lab tests, and some are diagnostic features.
Features of the dataset have been chosen from the medical doctors’ recommendations. Research papers’ suggestions for effective features related to heart disease are also considered. Common factors or characteristics that contribute to cardiovascular disease have been identified. The "Target" field is referred to the presence of heart disease in the patient. The names of the patients were removed from the database after it is finished properly. The whole process contains four files as follows:
• Some patients’ Information history and symptoms are recorded by the resident doctor on paper sheets.
• Some patients’ Information history and symptoms are recorded from the hospital system.
• The tests of LDL-Cholesterol of patients from the hospital lab.
• The scanned Echos and ECGs of patients.
The procedure of building the dataset and each attribute is explained below.
**1. Name of The Patient:** Although this field is removed from the dataset, it has its important to rectify the dataset and make the data more purify. The data for this feature is collected by the researcher on the day when the patient visited the hospital. After the data is inserted into excel sheets for all features, this field is removed from the prepared dataset to keep the social security of patients.
**2. Age:** Age of patients in years.
**3. Sex:** The gender of the patient (1 refers to female and 0 to male).
**4. Smoking:** If the patient smoke or not (0=No, 1=Yes).
**5. Years: **Number of years of smoking if smoker.
**6. LDL:** LDL-Cholesterol ratio of the patient.
**7. Chp:** Chest pain type (1=Typical angina, 2=Atypical angina, 3=Non-anginal pain, 4=Asymptomatic).
**8. Height:** The height of the patient in cm.
**9. Weight:** The weight of patients in kg.
**10. FH:** Family history of heart disease.
**11. Active:** If the patient is active or not (0=No, 1=Yes).
**12. Lifestyle:** The place of living (1=City, 2=Town, 3=Village).
**13. CI:** Does the patient has any cardiac catheterization or any intervention into the heart? (0=No, 1=Yes).
**14. HR:** Heart Rate ratio.
**15. DM:** Does the patient has diabetes (0=No, 1=Yes).
**16. Bpsys:** The ratio of Systolic Blood Pressure.
**17. Bpdias: **The Diastolic ratio of Blood Pressure.
**18. HTN:** Does the patient suffer from hypertension (0=No, 1=Yes).
**19. IVSD:** An Echo parameter (Interventricular Sepal and Diastole). IVSD is a measurement that is used to determine Left Ventricular Hypertrophy (LVH).
**20. ECGpatt:** Contains four categories of an ECG test which are (1=ST-Elevation, 2=ST-Depression, 3=T-Inversion, 4=Normal).
**21. Qwave:** The presence of the Q wave (0=No, 1=Yes).
**22. Target:** If the patient suffers from heart disease or not (0=without heart disease, 1=with heart disease).
Cite This Dataset as:
Hangaw Qadir Ahmed, Shwan Othman Amen, Banan Qasim Rassol, & Ibrahim Ismael Hamad. (2022). <i>Erbil Heart Disease Dataset</i> [Data set]. Kaggle. https://doi.org/10.34740/KAGGLE/DSV/3989065
本数据集源于医院直接且手工收集的病人信息。所有数据均由位于伊拉克埃尔比勒的私人医疗机构——医疗援助中心收集,该中心是一家集医疗和心脏中心于一体的机构。构建此数据集的主要目的是针对本土患者的心脏病信息进行研究,以预测患者是否患有心脏病。首先,由住院医生和心脏病专家对就诊患者的相关信息和症状进行手动记录于纸质表格中。随后,研究人员极为细致且准确地将这些数据保存至电子表格中。为确保数据集无缺失值,已将任何信息缺失的行记录剔除。收集到的约400名患者的24个特征信息被整合,但在填补缺失值后,数据集中仅保留了333名患者的记录。所收集的数据被划分为五大类别:部分涉及患者的人口统计学信息,部分为患者病史,部分为体格检查和症状表现,部分为医学实验室检查,部分为诊断特征。
数据集的特征选自医学专家的建议,同时亦考虑了研究论文中关于心脏病有效特征的推荐。识别出了对心血管疾病有贡献的常见因素或特征。'目标'字段指代患者是否存在心脏病。患者的姓名在数据集完成后已被从数据库中移除,以保护患者的隐私安全。整个数据集的构建过程包含四个文件:
• 部分患者的信息病史和症状由住院医生在纸质表格中记录。
• 部分患者的信息病史和症状由医院系统记录。
• 患者的低密度脂蛋白胆固醇测试结果来自医院实验室。
• 患者的超声心动图和心电图扫描。
以下详细解释了构建数据集的过程及每个属性的详细信息。
**1. 患者姓名:** 虽然此字段已从数据集中删除,但其对于数据集的校对和数据的纯净化具有重要意义。该特征的数据收集于患者就诊当日,在所有特征数据被插入电子表格后,此字段即从准备好的数据集中移除,以维护患者的社交安全。
**2. 年龄:** 患者的年龄(以年为单位)。
**3. 性别:** 患者的性别(1代表女性,0代表男性)。
**4. 吸烟:** 患者是否吸烟(0=否,1=是)。
**5. 吸烟年数:** 如果吸烟,吸烟的年数。
**6. 低密度脂蛋白:** 患者的低密度脂蛋白胆固醇比例。
**7. 胸痛类型:** 胸痛类型(1=典型心绞痛,2=非典型心绞痛,3=非心绞痛疼痛,4=无症状)。
**8. 身高:** 患者的身高(以厘米为单位)。
**9. 体重:** 患者的体重(以千克为单位)。
**10. 家族史:** 心脏病家族史。
**11. 活跃度:** 患者是否活跃(0=否,1=是)。
**12. 居住地:** 居住地(1=城市,2=镇,3=乡村)。
**13. CI:** 患者是否进行过心脏导管检查或任何心脏干预?(0=否,1=是)。
**14. HR:** 心率比例。
**15. DM:** 患者是否患有糖尿病(0=否,1=是)。
**16. Bpsys:** 收缩压比率。
**17. Bpdias:** 舒张压比率。
**18. HTN:** 患者是否患有高血压(0=否,1=是)。
**19. IVSD:** 超声心动图参数(室间隔和舒张)。IVSD是一种用于确定左心室肥厚(LVH)的测量方法。
**20. ECGpatt:** 包含四个类别的心电图测试,分别是(1=ST段抬高,2=ST段压低,3=T波倒置,4=正常)。
**21. Qwave:** Q波的存在(0=无,1=有)。
**22. Target:** 患者是否患有心脏病(0=无心脏病,1=有心脏病)。
引用此数据集时,请按照以下格式:Hangaw Qadir Ahmed, Shwan Othman Amen, Banan Qasim Rassol, & Ibrahim Ismael Hamad. (2022). 《埃尔比勒心脏病数据集》[数据集]. Kaggle. https://doi.org/10.34740/KAGGLE/DSV/3989065
提供机构:
Kaggle
搜集汇总
数据集介绍

背景与挑战
背景概述
Erbil Heart Disease Dataset是一个包含333名患者24个特征的心脏病研究数据集,数据来自伊拉克埃尔比勒市的私立医院,经过严格清洗,用于预测心脏病风险。数据集涵盖人口统计、病史、体检和实验室测试等多维度信息。
以上内容由遇见数据集搜集并总结生成



