electricsheepafrica/malaria-diagnosis-subsaharan-africa-2024
收藏Hugging Face2026-03-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/malaria-diagnosis-subsaharan-africa-2024
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: age
dtype: float64
- name: temperature_celsius
dtype: float64
- name: fever_duration_days
dtype: float64
- name: parasite_density
dtype: float64
- name: haemoglobin_gdl
dtype: float64
- name: sex
dtype: string
- name: age_group
dtype: string
- name: symptom_headache
dtype: string
- name: symptom_chills
dtype: string
- name: symptom_vomiting
dtype: string
- name: symptom_diarrhoea
dtype: string
- name: symptom_joint_pain
dtype: string
- name: prior_antimalarial
dtype: string
- name: itn_use
dtype: string
- name: country
dtype: string
- name: location_type
dtype: string
- name: season
dtype: string
- name: rdt_result
dtype: string
- name: microscopy_result
dtype: string
- name: plasmodium_species
dtype: string
- name: treatment_outcome
dtype: string
- name: confirmed_malaria
dtype: int64
splits:
- name: train
num_examples: 499
- name: test
num_examples: 99
task_categories:
- tabular-classification
language:
- en
tags:
- africa
- nigeria
- kenya
- malaria-diagnostics
- synthetic
- machine-learning
- electric-sheep-africa
license: other
---
# Malaria Clinical Diagnosis & RDT Outcomes Bundle — Teaser Dataset
This is the **public teaser** of the Malaria Clinical Diagnosis & RDT Outcomes Bundle dataset bundle.
It contains the full schema, documentation, and a **499-row sample**.
**The complete bundle** — including the full dataset (30,000 rows), trained xgboost model (AUC-ROC: 1.000), and fully-executed notebook — is available on Gumroad:
👉 **[Get the full bundle on Gumroad](https://kossisoro.gumroad.com/l/malaria-dx)**
---
## Abstract
This pack provides a research-grade, ML-ready dataset for malaria clinical diagnosis and RDT outcome prediction in Sub-Saharan Africa, with a focus on Nigeria and Kenya. The dataset comprises 30,000 individual-level records (12,000 real-base + 18,000 synthetic augmentation) across 22 features spanning demographics, clinical presentation (fever, symptoms, haemoglobin), diagnostic test results (RDT, microscopy), parasite characteristics, and treatment outcomes. Every distribution parameter is traceable to a verified data source: WHO Global Health Observatory API, DHS Program API, or peer-reviewed publications (all verified March 2026). Key verified statistics anchoring the dataset: Nigeria malaria incidence 294.25/1,000 [211–397] vs Kenya 74.17/1,000 [37–131] (WHO GHO 2024); Nigeria RDT test positivity 70.7% vs Kenya 43.0% (computed from WHO GHO 2024); Nigeria RDT prevalence in children <5: 39.6% vs Kenya 4.4% (DHS API). The pack includes a baseline XGBoost diagnostic classifier, ONNX export for edge deployment, inference wrapper, and full paper-style documentation. The Timber C99 compilation story makes this pack ideal for demonstrating embedded ML deployment on low-power diagnostic devices.
---
## Dataset Card
| Attribute | Value |
|---|---|
| **Full dataset rows** | 30,000 (12,000 real + 18,000 synthetic) |
| **Teaser rows** | 598 (this download) |
| **Features** | 21 |
| **Target** | `confirmed_malaria` |
| **Geography** | Nigeria, Kenya |
| **Model AUC-ROC** | 1.000 (on held-out test set, real data only) |
---
## Methodology Summary
All synthetic distribution parameters are grounded in peer-reviewed sources. Features are sampled from specified distributions (truncated normal, lognormal, categorical, Poisson, etc.) with parameters extracted from published literature. Validation and test sets contain real data only for evaluation integrity. See the full README in the Gumroad bundle for complete methodology.
---
## Limitations
- Geographic scope limited to Nigeria, Kenya
- Synthetic data may not capture complex multivariate interactions
- Not intended for direct production deployment without live data validation
- See full README in the Gumroad bundle for comprehensive limitations
---
## Citation
```bibtex
@dataset{esa_malaria_diagnosis_subsaharan_africa_2024_2026,
author = {{Electric Sheep Africa}},
title = {Malaria Clinical Diagnosis & RDT Outcomes Bundle},
year = {2026},
version = {1.0.0},
publisher = {Gumroad},
}
```
---
*Electric Sheep Africa — Building Africa's AI data layer.*
提供机构:
electricsheepafrica



