five

MaxPrestige/Synthetic-Diabetes-Dataset

收藏
Hugging Face2025-11-24 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/MaxPrestige/Synthetic-Diabetes-Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - token-classification - reinforcement-learning language: - en tags: - health - diabetes - glucose - classification size_categories: - 10K<n<100K --- # ⛽ Synthetic Diabetes Data This dataset contains various features that are helpful for predicting if a patient has diabetes. The data is compiled into a single csv file for analysis and model training. # 📁 Dataset Description Contains synthetic data on based on synthetic patient information. # Columns The dataset includes the following columns: - abdominal_obesity (int) - alcohol_consumption_per_week (int): > The amount of alcohol consumption per week of the patient. - alcohol_group (string): > The category the patient is placed in, in regards to their alcohol consumption. - bmi (float): > The body mass index of the patient. - bmi_group (string): > The category the patient is placed in, in regards to their bmi. - cardiovascular_history (int): > Indicates whether or not patient has a history of cardiovascular. - cholesterol_total (int): > Sum of all cholesterol types in blood. Indicator of diabetic dyslipidemia, referring to a specific pattern of abnormal blood lipid levels commonly seen in people with diabetes. > Total Cholesterol = HDL+LDL+VLDL = HDL+LDL+(Triglycerides/5) > Ideal Level: Total Cholesterol: <200 mg/dL - diabetes_risk_score (float) - diabetes_stage (string) - diastolic_bp (int): > Pressure when the heart relaxes between beats. > Elevated BP is common in diabetes but does not diagnose it. > Useful for risk stratification, not for diagnosis. - education_level (string): > The education level of the patient. - employment_status (string): > The employment status of the patient. - ethnicity (string): > The ethnicity of the patient. - family_history_diabetes (int): > Indicates whether or not the patient's family has history of diabetes. - gender (string): > The gender of the patient. - glucose_fasting (int): > Glucose levels when fasting - glucose_postprandial (int): > Measure of blood glucose levels ~ 2 hours after a meal. > Elevated post-meal glucose is an early sign of impaired glucose metabolism and type 2 diabetes. > Ranges: <18 µU/mL too Low; Typical peak: 18–276 mg/dL = Healty insulin response; ≥276 mg/dL = insulin resistance or hyperinsulinemia - hba1c (float): > Hemoglobin A1c > Average blood sugar over 2–3 months. > Diagnostic use: HbA1c 5.7–6.4% = prediabetes. HbA1c ≥6.5% = diabetes > Strength: Does not require fasting and is widely used for diagnosis and monitoring. - hdl_cholesterol (int): > High-Density Lipoprotein Cholesterol: “Good” cholesterol. Removes excess cholesterol from arteries. Higher HDL = better. > Ideal Level: >40 mg/dL (men), >50 mg/dL (women) - heart_rate (int) > Diabetes can cause autonomic neuropathy, leading to abnormal heart rate or variability. > Not diagnostic, but may indicate complications. - hypertension_history (int): > Indicates whether or not the patient has a history of hypertension. - income_level (string): > The income bracket of the patient. - insulin_level (float): > Units: µU/mL (micro-units per milliliter) or mIU/L (milli-international units per liter). Units are interchangeable: 1 µU/mL = 1 mIU/L > Fasting Insulin (after 8–12 hours without food): > - Normal: 2–25 µU/mL > - Optimal: Many experts suggest <10 µU/mL for best metabolic health > - Above 25 µU/mL: May indicate insulin resistance or hyperinsulinemia > - Below 2 µU/mL: May indicate type 1 diabetes or pancreatic dysfunction > Postprandial (after eating, usually 30–90 minutes): > - Normal peak: 18–276 µU/mL > - Should return to baseline within 2–3 hours > - Higher spikes often occur after high-carb meals - ldl_cholesterol (int): > Low-Density Lipoprotein Cholesterol: “Bad” cholesterol. > Deposits cholesterol in artery walls, leading to plaque buildup. Lower LDL = better. > Ideal Level: <100 mg/dL - physical_activity_minutes_per_week (int): > The average time of performing physical activities per week (in minutes). - screen_time_hours_per_day (float): > The time (in hours) that the patient spends using technology. - sleep_hours_per_day (float): > The amount of time the patient sleeps per night. - smoking_status (string): > The smoking status of the patient. - systolic_bp (int): > Normal peak: 18–276 µU/mL > Should return to baseline within 2–3 hours > Higher spikes often occur after high-carb meals - triglycerides (int): > A type of fat in blood. High levels often linked to insulin resistance and diabetes. > Ideal Level: <150 mg/dL - waist_to_hip_ratio (float): > The waist circumference divided by the hip circumference. - diagnosed_diabetes (int) > Label/Target Column # 📚 Data Sources The data in this project is sourced from the following source: [Diabetes EDA Analysis](https://huggingface.co/datasets/guyshilo12/diabetes_eda_analysis)

许可证:MIT协议 任务类别: - 令牌分类(Token Classification) - 强化学习 语言:英语 标签: - 健康 - 糖尿病 - 葡萄糖 - 分类 样本量范围:10K<n<100K # ⛽ 合成糖尿病数据集 本数据集包含多种有助于预测患者是否罹患糖尿病的特征,已整合为单个CSV文件以供分析与模型训练。 # 📁 数据集说明 本数据集包含基于合成患者信息生成的模拟数据。 # 字段说明 本数据集包含以下字段: - 腹部肥胖(abdominal_obesity,整数型) - 每周酒精摄入量(alcohol_consumption_per_week,整数型):患者每周的酒精摄入总量 - 酒精摄入分组(alcohol_group,字符串型):根据患者酒精摄入情况划分的分组类别 - 体重指数(Body Mass Index, BMI,浮点型):患者的体重指数 - BMI分组(bmi_group,字符串型):根据患者体重指数划分的分组类别 - 心血管病史(cardiovascular_history,整数型):标识患者是否存在心血管病史 - 总胆固醇(cholesterol_total,整数型):血液中各类胆固醇的总和,是糖尿病性血脂异常的指标——该指标指糖尿病患者常见的异常血脂模式。总胆固醇=高密度脂蛋白胆固醇(High-Density Lipoprotein, HDL)+低密度脂蛋白胆固醇(Low-Density Lipoprotein, LDL)+极低密度脂蛋白胆固醇(Very Low-Density Lipoprotein, VLDL)=HDL+LDL+(甘油三酯/5)。理想水平:总胆固醇<200mg/dL - 糖尿病风险评分(diabetes_risk_score,浮点型) - 糖尿病分期(diabetes_stage,字符串型) - 舒张压(diastolic_bp,整数型):心脏舒张间歇期的血压。糖尿病患者常出现血压升高,但该指标无法用于糖尿病诊断,仅有助于风险分层 - 教育水平(education_level,字符串型):患者的教育程度 - 就业状态(employment_status,字符串型):患者的就业情况 - 种族(ethnicity,字符串型):患者的种族背景 - 糖尿病家族史(family_history_diabetes,整数型):标识患者家族是否存在糖尿病病史 - 性别(gender,字符串型):患者的性别 - 空腹血糖(glucose_fasting,整数型):患者空腹状态下的血糖水平 - 餐后血糖(glucose_postprandial,整数型):餐后约2小时的血糖测量值。餐后血糖升高是葡萄糖代谢受损与2型糖尿病的早期征兆。参考范围:<18µU/mL为血糖过低;典型峰值18–276mg/dL为健康胰岛素反应;≥276mg/dL提示胰岛素抵抗或高胰岛素血症 - 糖化血红蛋白(Hemoglobin A1c, HbA1c,浮点型):反映患者2-3个月的平均血糖水平。诊断标准:HbA1c 5.7–6.4%为糖尿病前期,HbA1c≥6.5%可确诊糖尿病。优势:无需空腹采血,广泛用于糖尿病的诊断与病情监测 - 高密度脂蛋白胆固醇(HDL-C,整数型):又称“有益”胆固醇,可清除动脉内多余胆固醇,水平越高越好。理想水平:男性>40mg/dL,女性>50mg/dL - 心率(heart_rate,整数型):糖尿病可引发自主神经病变,导致心率或心率变异性异常,该指标无法用于糖尿病诊断,但可提示并发症风险 - 高血压病史(hypertension_history,整数型):标识患者是否存在高血压病史 - 收入水平(income_level,字符串型):患者所属的收入层级 - 胰岛素水平(insulin_level,浮点型):单位:µU/mL(微单位/毫升)或mIU/L(毫国际单位/升),二者可互换:1µU/mL=1mIU/L。空腹胰岛素(禁食8-12小时后):正常范围2–25µU/mL;多数专家建议<10µU/mL以获得最佳代谢健康;>25µU/mL可能提示胰岛素抵抗或高胰岛素血症;<2µU/mL可能提示1型糖尿病或胰腺功能异常。餐后胰岛素(进食后30-90分钟):正常峰值18–276µU/mL,应在2-3小时内恢复至基线水平,高碳水饮食后常出现更高峰值 - 低密度脂蛋白胆固醇(LDL-C,整数型):又称“有害”胆固醇,可将胆固醇沉积于动脉壁形成斑块,水平越低越好。理想水平:<100mg/dL - 每周体育活动时长(physical_activity_minutes_per_week,整数型):患者每周平均体育活动时间,单位为分钟 - 每日屏幕使用时长(screen_time_hours_per_day,浮点型):患者每日使用电子设备的时长,单位为小时 - 每日睡眠时长(sleep_hours_per_day,浮点型):患者每晚的睡眠时间 - 吸烟状态(smoking_status,字符串型):患者的吸烟情况 - 收缩压(systolic_bp,整数型):>正常峰值:18–276 µU/mL;>应在2-3小时内恢复至基线水平;>高碳水饮食后常出现更高峰值(注:原文此处存在复制粘贴错误,实际应为收缩压相关说明,按原文直译) - 甘油三酯(triglycerides,整数型):血液中的一类脂肪,高水平常与胰岛素抵抗和糖尿病相关,理想水平:<150mg/dL - 腰臀比(waist_to_hip_ratio,浮点型):腰围与臀围的比值 - 糖尿病确诊状态(diagnosed_diabetes,整数型):标签/目标列 # 📚 数据来源 本项目使用的数据来源于以下数据集:[糖尿病探索性数据分析](https://huggingface.co/datasets/guyshilo12/diabetes_eda_analysis)
提供机构:
MaxPrestige
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作