MaxPrestige/Synthetic-Diabetes-Dataset
收藏Hugging Face2025-11-24 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/MaxPrestige/Synthetic-Diabetes-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- token-classification
- reinforcement-learning
language:
- en
tags:
- health
- diabetes
- glucose
- classification
size_categories:
- 10K<n<100K
---
# ⛽ Synthetic Diabetes Data
This dataset contains various features that are helpful for predicting if a patient has diabetes. The data is compiled into a single csv file for analysis and model training.
# 📁 Dataset Description
Contains synthetic data on based on synthetic patient information.
# Columns
The dataset includes the following columns:
- abdominal_obesity (int)
- alcohol_consumption_per_week (int):
> The amount of alcohol consumption per week of the patient.
- alcohol_group (string):
> The category the patient is placed in, in regards to their alcohol consumption.
- bmi (float):
> The body mass index of the patient.
- bmi_group (string):
> The category the patient is placed in, in regards to their bmi.
- cardiovascular_history (int):
> Indicates whether or not patient has a history of cardiovascular.
- cholesterol_total (int):
> Sum of all cholesterol types in blood. Indicator of diabetic dyslipidemia, referring to a specific pattern of abnormal blood lipid levels commonly seen in people with diabetes.
> Total Cholesterol = HDL+LDL+VLDL = HDL+LDL+(Triglycerides/5)
> Ideal Level: Total Cholesterol: <200 mg/dL
- diabetes_risk_score (float)
- diabetes_stage (string)
- diastolic_bp (int):
> Pressure when the heart relaxes between beats.
> Elevated BP is common in diabetes but does not diagnose it.
> Useful for risk stratification, not for diagnosis.
- education_level (string):
> The education level of the patient.
- employment_status (string):
> The employment status of the patient.
- ethnicity (string):
> The ethnicity of the patient.
- family_history_diabetes (int):
> Indicates whether or not the patient's family has history of diabetes.
- gender (string):
> The gender of the patient.
- glucose_fasting (int):
> Glucose levels when fasting
- glucose_postprandial (int):
> Measure of blood glucose levels ~ 2 hours after a meal.
> Elevated post-meal glucose is an early sign of impaired glucose metabolism and type 2 diabetes.
> Ranges: <18 µU/mL too Low; Typical peak: 18–276 mg/dL = Healty insulin response; ≥276 mg/dL = insulin resistance or hyperinsulinemia
- hba1c (float):
> Hemoglobin A1c
> Average blood sugar over 2–3 months.
> Diagnostic use: HbA1c 5.7–6.4% = prediabetes. HbA1c ≥6.5% = diabetes
> Strength: Does not require fasting and is widely used for diagnosis and monitoring.
- hdl_cholesterol (int):
> High-Density Lipoprotein Cholesterol: “Good” cholesterol. Removes excess cholesterol from arteries. Higher HDL = better.
> Ideal Level: >40 mg/dL (men), >50 mg/dL (women)
- heart_rate (int)
> Diabetes can cause autonomic neuropathy, leading to abnormal heart rate or variability.
> Not diagnostic, but may indicate complications.
- hypertension_history (int):
> Indicates whether or not the patient has a history of hypertension.
- income_level (string):
> The income bracket of the patient.
- insulin_level (float):
> Units: µU/mL (micro-units per milliliter) or mIU/L (milli-international units per liter). Units are interchangeable: 1 µU/mL = 1 mIU/L
> Fasting Insulin (after 8–12 hours without food):
> - Normal: 2–25 µU/mL
> - Optimal: Many experts suggest <10 µU/mL for best metabolic health
> - Above 25 µU/mL: May indicate insulin resistance or hyperinsulinemia
> - Below 2 µU/mL: May indicate type 1 diabetes or pancreatic dysfunction
> Postprandial (after eating, usually 30–90 minutes):
> - Normal peak: 18–276 µU/mL
> - Should return to baseline within 2–3 hours
> - Higher spikes often occur after high-carb meals
- ldl_cholesterol (int):
> Low-Density Lipoprotein Cholesterol: “Bad” cholesterol.
> Deposits cholesterol in artery walls, leading to plaque buildup. Lower LDL = better.
> Ideal Level: <100 mg/dL
- physical_activity_minutes_per_week (int):
> The average time of performing physical activities per week (in minutes).
- screen_time_hours_per_day (float):
> The time (in hours) that the patient spends using technology.
- sleep_hours_per_day (float):
> The amount of time the patient sleeps per night.
- smoking_status (string):
> The smoking status of the patient.
- systolic_bp (int):
> Normal peak: 18–276 µU/mL
> Should return to baseline within 2–3 hours
> Higher spikes often occur after high-carb meals
- triglycerides (int):
> A type of fat in blood. High levels often linked to insulin resistance and diabetes.
> Ideal Level: <150 mg/dL
- waist_to_hip_ratio (float):
> The waist circumference divided by the hip circumference.
- diagnosed_diabetes (int)
> Label/Target Column
# 📚 Data Sources
The data in this project is sourced from the following source: [Diabetes EDA Analysis](https://huggingface.co/datasets/guyshilo12/diabetes_eda_analysis)
许可证:MIT协议
任务类别:
- 令牌分类(Token Classification)
- 强化学习
语言:英语
标签:
- 健康
- 糖尿病
- 葡萄糖
- 分类
样本量范围:10K<n<100K
# ⛽ 合成糖尿病数据集
本数据集包含多种有助于预测患者是否罹患糖尿病的特征,已整合为单个CSV文件以供分析与模型训练。
# 📁 数据集说明
本数据集包含基于合成患者信息生成的模拟数据。
# 字段说明
本数据集包含以下字段:
- 腹部肥胖(abdominal_obesity,整数型)
- 每周酒精摄入量(alcohol_consumption_per_week,整数型):患者每周的酒精摄入总量
- 酒精摄入分组(alcohol_group,字符串型):根据患者酒精摄入情况划分的分组类别
- 体重指数(Body Mass Index, BMI,浮点型):患者的体重指数
- BMI分组(bmi_group,字符串型):根据患者体重指数划分的分组类别
- 心血管病史(cardiovascular_history,整数型):标识患者是否存在心血管病史
- 总胆固醇(cholesterol_total,整数型):血液中各类胆固醇的总和,是糖尿病性血脂异常的指标——该指标指糖尿病患者常见的异常血脂模式。总胆固醇=高密度脂蛋白胆固醇(High-Density Lipoprotein, HDL)+低密度脂蛋白胆固醇(Low-Density Lipoprotein, LDL)+极低密度脂蛋白胆固醇(Very Low-Density Lipoprotein, VLDL)=HDL+LDL+(甘油三酯/5)。理想水平:总胆固醇<200mg/dL
- 糖尿病风险评分(diabetes_risk_score,浮点型)
- 糖尿病分期(diabetes_stage,字符串型)
- 舒张压(diastolic_bp,整数型):心脏舒张间歇期的血压。糖尿病患者常出现血压升高,但该指标无法用于糖尿病诊断,仅有助于风险分层
- 教育水平(education_level,字符串型):患者的教育程度
- 就业状态(employment_status,字符串型):患者的就业情况
- 种族(ethnicity,字符串型):患者的种族背景
- 糖尿病家族史(family_history_diabetes,整数型):标识患者家族是否存在糖尿病病史
- 性别(gender,字符串型):患者的性别
- 空腹血糖(glucose_fasting,整数型):患者空腹状态下的血糖水平
- 餐后血糖(glucose_postprandial,整数型):餐后约2小时的血糖测量值。餐后血糖升高是葡萄糖代谢受损与2型糖尿病的早期征兆。参考范围:<18µU/mL为血糖过低;典型峰值18–276mg/dL为健康胰岛素反应;≥276mg/dL提示胰岛素抵抗或高胰岛素血症
- 糖化血红蛋白(Hemoglobin A1c, HbA1c,浮点型):反映患者2-3个月的平均血糖水平。诊断标准:HbA1c 5.7–6.4%为糖尿病前期,HbA1c≥6.5%可确诊糖尿病。优势:无需空腹采血,广泛用于糖尿病的诊断与病情监测
- 高密度脂蛋白胆固醇(HDL-C,整数型):又称“有益”胆固醇,可清除动脉内多余胆固醇,水平越高越好。理想水平:男性>40mg/dL,女性>50mg/dL
- 心率(heart_rate,整数型):糖尿病可引发自主神经病变,导致心率或心率变异性异常,该指标无法用于糖尿病诊断,但可提示并发症风险
- 高血压病史(hypertension_history,整数型):标识患者是否存在高血压病史
- 收入水平(income_level,字符串型):患者所属的收入层级
- 胰岛素水平(insulin_level,浮点型):单位:µU/mL(微单位/毫升)或mIU/L(毫国际单位/升),二者可互换:1µU/mL=1mIU/L。空腹胰岛素(禁食8-12小时后):正常范围2–25µU/mL;多数专家建议<10µU/mL以获得最佳代谢健康;>25µU/mL可能提示胰岛素抵抗或高胰岛素血症;<2µU/mL可能提示1型糖尿病或胰腺功能异常。餐后胰岛素(进食后30-90分钟):正常峰值18–276µU/mL,应在2-3小时内恢复至基线水平,高碳水饮食后常出现更高峰值
- 低密度脂蛋白胆固醇(LDL-C,整数型):又称“有害”胆固醇,可将胆固醇沉积于动脉壁形成斑块,水平越低越好。理想水平:<100mg/dL
- 每周体育活动时长(physical_activity_minutes_per_week,整数型):患者每周平均体育活动时间,单位为分钟
- 每日屏幕使用时长(screen_time_hours_per_day,浮点型):患者每日使用电子设备的时长,单位为小时
- 每日睡眠时长(sleep_hours_per_day,浮点型):患者每晚的睡眠时间
- 吸烟状态(smoking_status,字符串型):患者的吸烟情况
- 收缩压(systolic_bp,整数型):>正常峰值:18–276 µU/mL;>应在2-3小时内恢复至基线水平;>高碳水饮食后常出现更高峰值(注:原文此处存在复制粘贴错误,实际应为收缩压相关说明,按原文直译)
- 甘油三酯(triglycerides,整数型):血液中的一类脂肪,高水平常与胰岛素抵抗和糖尿病相关,理想水平:<150mg/dL
- 腰臀比(waist_to_hip_ratio,浮点型):腰围与臀围的比值
- 糖尿病确诊状态(diagnosed_diabetes,整数型):标签/目标列
# 📚 数据来源
本项目使用的数据来源于以下数据集:[糖尿病探索性数据分析](https://huggingface.co/datasets/guyshilo12/diabetes_eda_analysis)
提供机构:
MaxPrestige



