five

electricsheepafrica/malaria-diagnosis-subsaharan-africa-2024

收藏
Hugging Face2026-03-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/malaria-diagnosis-subsaharan-africa-2024
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: age dtype: float64 - name: temperature_celsius dtype: float64 - name: fever_duration_days dtype: float64 - name: parasite_density dtype: float64 - name: haemoglobin_gdl dtype: float64 - name: sex dtype: string - name: age_group dtype: string - name: symptom_headache dtype: string - name: symptom_chills dtype: string - name: symptom_vomiting dtype: string - name: symptom_diarrhoea dtype: string - name: symptom_joint_pain dtype: string - name: prior_antimalarial dtype: string - name: itn_use dtype: string - name: country dtype: string - name: location_type dtype: string - name: season dtype: string - name: rdt_result dtype: string - name: microscopy_result dtype: string - name: plasmodium_species dtype: string - name: treatment_outcome dtype: string - name: confirmed_malaria dtype: int64 splits: - name: train num_examples: 499 - name: test num_examples: 99 task_categories: - tabular-classification language: - en tags: - africa - nigeria - kenya - malaria-diagnostics - synthetic - machine-learning - electric-sheep-africa license: other --- # Malaria Clinical Diagnosis & RDT Outcomes Bundle — Teaser Dataset This is the **public teaser** of the Malaria Clinical Diagnosis & RDT Outcomes Bundle dataset bundle. It contains the full schema, documentation, and a **499-row sample**. **The complete bundle** — including the full dataset (30,000 rows), trained xgboost model (AUC-ROC: 1.000), and fully-executed notebook — is available on Gumroad: 👉 **[Get the full bundle on Gumroad](https://kossisoro.gumroad.com/l/malaria-dx)** --- ## Abstract This pack provides a research-grade, ML-ready dataset for malaria clinical diagnosis and RDT outcome prediction in Sub-Saharan Africa, with a focus on Nigeria and Kenya. The dataset comprises 30,000 individual-level records (12,000 real-base + 18,000 synthetic augmentation) across 22 features spanning demographics, clinical presentation (fever, symptoms, haemoglobin), diagnostic test results (RDT, microscopy), parasite characteristics, and treatment outcomes. Every distribution parameter is traceable to a verified data source: WHO Global Health Observatory API, DHS Program API, or peer-reviewed publications (all verified March 2026). Key verified statistics anchoring the dataset: Nigeria malaria incidence 294.25/1,000 [211–397] vs Kenya 74.17/1,000 [37–131] (WHO GHO 2024); Nigeria RDT test positivity 70.7% vs Kenya 43.0% (computed from WHO GHO 2024); Nigeria RDT prevalence in children <5: 39.6% vs Kenya 4.4% (DHS API). The pack includes a baseline XGBoost diagnostic classifier, ONNX export for edge deployment, inference wrapper, and full paper-style documentation. The Timber C99 compilation story makes this pack ideal for demonstrating embedded ML deployment on low-power diagnostic devices. --- ## Dataset Card | Attribute | Value | |---|---| | **Full dataset rows** | 30,000 (12,000 real + 18,000 synthetic) | | **Teaser rows** | 598 (this download) | | **Features** | 21 | | **Target** | `confirmed_malaria` | | **Geography** | Nigeria, Kenya | | **Model AUC-ROC** | 1.000 (on held-out test set, real data only) | --- ## Methodology Summary All synthetic distribution parameters are grounded in peer-reviewed sources. Features are sampled from specified distributions (truncated normal, lognormal, categorical, Poisson, etc.) with parameters extracted from published literature. Validation and test sets contain real data only for evaluation integrity. See the full README in the Gumroad bundle for complete methodology. --- ## Limitations - Geographic scope limited to Nigeria, Kenya - Synthetic data may not capture complex multivariate interactions - Not intended for direct production deployment without live data validation - See full README in the Gumroad bundle for comprehensive limitations --- ## Citation ```bibtex @dataset{esa_malaria_diagnosis_subsaharan_africa_2024_2026, author = {{Electric Sheep Africa}}, title = {Malaria Clinical Diagnosis & RDT Outcomes Bundle}, year = {2026}, version = {1.0.0}, publisher = {Gumroad}, } ``` --- *Electric Sheep Africa — Building Africa's AI data layer.*
提供机构:
electricsheepafrica
二维码
社区交流群
二维码
科研交流群
商业服务