five

LinderStacy-1/maternal-health-pregnancy

收藏
Hugging Face2026-03-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/LinderStacy-1/maternal-health-pregnancy
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - tabular-classification - tabular-regression language: - en tags: - synthetic - healthcare - maternal-health - pregnancy - antenatal-care - preeclampsia - gestational-diabetes - anemia - who-guidelines - lmic pretty_name: "Synthetic Maternal Health & Pregnancy Complications Dataset (ANC)" size_categories: - 10K<n<100K configs: - config_name: low_burden data_files: data/maternal_low_burden.csv - config_name: moderate_burden data_files: data/maternal_moderate_burden.csv default: true - config_name: high_burden data_files: data/maternal_high_burden.csv --- # Synthetic Maternal Health & Pregnancy Complications Dataset **The complete bundle** — including the full dataset (35,000 rows), trained xgboost model (AUC-ROC: 0.990), fully-executed notebook, and full Paper— is available on Gumroad: 👉 **[Get the full bundle on Gumroad for $30](https://kossisoro.gumroad.com/l/mphrsb)** --- ## Abstract This dataset provides **30,000 synthetic records** (10,000 per scenario) of pregnant women attending antenatal care (ANC) in LMIC facility settings. Each record contains 16 clinically relevant variables including demographics (age, gravidity, parity), clinical measurements (blood pressure, hemoglobin, fasting glucose, BMI, proteinuria), risk factors (HIV status), and outcomes (primary complication, pregnancy outcome, risk level). All distributions are parameterized from WHO ANC guidelines, the Lancet Maternal Health series, UNAIDS, IDF Diabetes Atlas, and DHS surveys. Three burden scenarios (low, moderate, high) capture the spectrum from well-resourced urban facilities to under-resourced high-HIV settings. ## 1. Introduction Maternal mortality remains unacceptably high in LMICs, with approximately 287,000 deaths annually (WHO 2023). Hypertensive disorders (preeclampsia/eclampsia), obstetric hemorrhage, and sepsis account for over 50% of maternal deaths (Say et al., 2014). Anemia affects 40-60% of pregnant women in Sub-Saharan Africa, and gestational diabetes prevalence is rising across LMICs. Open-access clinical datasets of maternal health from LMIC contexts are scarce due to privacy regulations and fragmented health information systems. This synthetic dataset fills that gap for: - Training ML models for pregnancy risk stratification - Benchmarking complication prediction algorithms - Prototyping ANC clinical decision support tools - Educational use in global maternal health curricula **This dataset is entirely synthetic. It must not be used for clinical decision-making.** ## 2. Methodology ### 2.1 Target Population Pregnant women aged 15-49 presenting for ANC or delivery at LMIC health facilities. ### 2.2 Epidemiological Parameterization | Parameter | Value | Source | | --- | --- | --- | | Preeclampsia prevalence | 2-10% by setting | Abalos et al., Hypertension in Pregnancy 2013 | | Eclampsia incidence | 0.1-1.5% | Abalos et al., 2013 | | GDM prevalence | 3-8% in LMIC | IDF Diabetes Atlas 2021 | | Obstetric hemorrhage | 2-8% | Say et al., Lancet Global Health 2014 | | Anemia in pregnancy (Hb<11) | 40-65% in SSA | Stevens et al., Lancet Global Health 2013 | | Severe anemia (Hb<7) | 2-15% by setting | Stevens et al., 2013 | | HIV prevalence (women) | 4-18% by setting | UNAIDS 2023 | | C-section rate (LMIC) | 5-18% | DHS Program; Vogel et al., 2014 | ### 2.3 Scenario Design | Scenario | Context | Preeclampsia | Eclampsia | GDM | Hemorrhage | Severe Anemia | HIV | | --- | --- | --- | --- | --- | --- | --- | --- | | Low burden | Urban LMIC, functional ANC | 4.2% | 0.4% | 6.8% | 2.6% | 2.9% | 4.2% | | Moderate burden | District hospital | 7.5% | 1.0% | 10.6% | 3.9% | 5.8% | 8.0% | | High burden | Under-resourced / high-HIV | 11.6% | 2.1% | 13.5% | 6.0% | 9.9% | 18.1% | ### 2.4 Risk Factor Modelling Complication probabilities are adjusted by individual risk factors: - **Preeclampsia**: OR ~1.8 for age >35, OR ~2.0 for BMI >30, OR ~1.5 for primigravida (WHO 2016) - **GDM**: OR ~1.5 for age >30, OR ~2.5 for BMI >25 (IDF 2021) - **Blood pressure**: Conditional on complication, with age and BMI adjustments - **Hemoglobin**: Background anemia prevalence plus complication-specific shifts ## 3. Dataset Description ### 3.1 Schema | Column | Type | Units | Range | Description | | --- | --- | --- | --- | --- | | id | int | — | 1-10000 | Unique identifier | | age_years | int | years | 15-49 | Maternal age | | gravidity | int | — | 1-16 | Total pregnancies including current | | parity | int | — | 0-15 | Previous deliveries | | gestational_age_weeks | float | weeks | 6.0-42.0 | GA at clinical visit | | bmi_pre_pregnancy | float | kg/m² | 14.0-48.0 | Pre-pregnancy BMI | | systolic_bp_mmhg | int | mmHg | 70-220 | Systolic blood pressure | | diastolic_bp_mmhg | int | mmHg | 40-140 | Diastolic blood pressure | | hemoglobin_gdl | float | g/dL | 3.0-17.0 | Hemoglobin concentration | | anemia_status | categorical | — | none/mild/moderate/severe | WHO pregnancy anemia classification | | fasting_glucose_mgdl | int | mg/dL | 45-250 | Fasting blood glucose | | proteinuria | ordinal | — | 0-4 | Urine protein (0=none, 4=≥+3) | | hiv_status | binary | — | 0/1 | HIV serostatus | | anc_visits | int | — | 0-15 | Number of ANC visits to date | | delivery_mode | categorical | — | vaginal/caesarean | Mode of delivery | | primary_complication | categorical | — | 6 classes | Primary pregnancy complication | | pregnancy_outcome | categorical | — | live_birth/stillbirth/maternal_death | Pregnancy outcome | | risk_level | categorical | — | low/moderate/high | Composite risk classification | ### 3.2 Classification Criteria | Classification | Criteria | Source | | --- | --- | --- | | Anemia (mild) | Hb 10.0-10.9 g/dL | WHO 2011 | | Anemia (moderate) | Hb 7.0-9.9 g/dL | WHO 2011 | | Anemia (severe) | Hb < 7.0 g/dL | WHO 2011 | | Hypertension | SBP ≥140 or DBP ≥90 mmHg | WHO ANC 2016 | | Severe hypertension | SBP ≥160 or DBP ≥110 mmHg | WHO ANC 2016 | | GDM (fasting) | Fasting glucose ≥92 mg/dL | IADPSG/WHO 2013 | ## 4. Validation ### 4.1 Cross-Scenario Monotonicity All adverse outcomes increase monotonically from low → moderate → high burden: anemia (48% → 58% → 65%), hypertension (5% → 9% → 14%), HIV (4% → 8% → 18%), stillbirth (0.6% → 0.8% → 1.4%). ### 4.2 Diagnostic Plots <p align="center"> <img src="validation_report.png" alt="Validation Report" width="100%"> </p> ## 5. Usage ### 5.1 Loading with HuggingFace `datasets` ```python from datasets import load_dataset dataset = load_dataset("electricsheepafrica/synthetic-maternal-pregnancy-complications-WHO-ANC", "moderate_burden") df = dataset["train"].to_pandas() ``` ### 5.2 Loading directly from CSV ```python import pandas as pd df = pd.read_csv("data/maternal_moderate_burden.csv") high_risk = df[df['risk_level'] == 'high'] print(f"High risk: {len(high_risk)/len(df)*100:.1f}%") ``` ### 5.3 Regenerating ```bash pip install numpy pandas scipy matplotlib python generate_dataset.py --all-scenarios --n 10000 --seed 42 python validate_dataset.py ``` ## 6. Limitations & Ethical Considerations - **Synthetic data**: No real patients. Not for clinical use. - **Simplified comorbidity**: Each woman has one primary complication; real pregnancies often involve multiple concurrent conditions. - **No temporal modelling**: Single timepoint snapshot; does not capture ANC trajectory or disease progression. - **HIV simplification**: HIV status modelled as binary; does not capture viral load, ART status, or CD4 count. - **Geographic generalization**: Parameters drawn from pooled LMIC estimates; may not represent any single country precisely. ## 7. References 1. WHO (2016). WHO Recommendations on Antenatal Care for a Positive Pregnancy Experience. Geneva. 2. Say L, et al. (2014). Global causes of maternal death. *Lancet Global Health*, 2(6):e323-333. 3. Abalos E, et al. (2013). Global and regional estimates of preeclampsia and eclampsia. *Hypertension in Pregnancy*, 32(sup1):36. 4. IDF (2021). *IDF Diabetes Atlas*, 10th edition. 5. Stevens GA, et al. (2013). Global, regional, and national trends in haemoglobin concentration. *Lancet Global Health*, 1(1):e16-25. 6. UNAIDS (2023). Global HIV & AIDS statistics fact sheet. 7. WHO (2023). Trends in maternal mortality 2000-2020. Geneva. 8. DHS Program. Demographic and Health Surveys, multiple countries. 9. Vogel JP, et al. (2014). Use of the Robson classification. *Lancet Global Health*, 2(5):e260-270. 10. Souza JP, et al. (2013). Moving beyond essential interventions. *Lancet*, 381(9879):1747-1755. ## Citation ```bibtex @dataset{esa_maternal_2025, title={Synthetic Maternal Health and Pregnancy Complications Dataset}, author={Electric Sheep Africa}, year={2025}, publisher={Hugging Face}, url={https://huggingface.co/datasets/electricsheepafrica/synthetic-maternal-pregnancy-complications-WHO-ANC} } ``` ## License This dataset is released under the [Creative Commons Attribution 4.0 International (CC-BY-4.0)](https://creativecommons.org/licenses/by/4.0/) license.
提供机构:
LinderStacy-1
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作