electricsheepafrica/africa-ssd-aah-final-dataset
收藏Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-ssd-aah-final-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- tabular-classification
- tabular-regression
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- ssd
pretty_name: "A Prospective Evaluation of Nutrition Protocol Adaptations in the Context of the COVID-19 Pandemic in South Sudan"
dataset_info:
splits:
- name: train
num_examples: 887
- name: test
num_examples: 221
---
# A Prospective Evaluation of Nutrition Protocol Adaptations in the Context of the COVID-19 Pandemic in South Sudan
**Publisher:** Johns Hopkins School of Public Health · **Source:** [HDX](https://data.humdata.org/dataset/ssd-aah-final-dataset) · **License:** `cc-by` · **Updated:** 2025-04-15
---
## Abstract
This is the underlying data for a manuscript pending publication. The manuscript reports results from a non-randomized prospective cohort study that compared outcomes of acutely malnourished children treated under South Sudan’s standard national CMAM protocol vs children treated via the COVID-modified CMAM protocol in terms of standard nutrition program indicators (i.e., recovery rates and length of stay).
Each row in this dataset represents first-level administrative unit observations. Temporal coverage is indicated by the `child_dob`, `enrollment_date` column(s). Geographic scope: **SSD**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Food security and nutrition |
| **Unit of observation** | First-level administrative unit observations |
| **Rows (total)** | 1,109 |
| **Columns** | 192 (115 numeric, 71 categorical, 6 datetime) |
| **Train split** | 887 rows |
| **Test split** | 221 rows |
| **Geographic scope** | SSD |
| **Publisher** | Johns Hopkins School of Public Health |
| **HDX last updated** | 2025-04-15 |
---
## Variables
**Geographic** — `state` (NBeG), `studyarm` (tsfp-covid, tsfp-stand, otp-covid), `hh_sex` (female, male), `hh_primary` (yes, no), `hh_yrs` (+10 yrs, 1-5 yrs, 5-10 yrs) and 28 others.
**Temporal** — `enrollment_date`, `adm_date`, `dc_date`, `trans_adm_date`, `trans_dc_date` and 4 others.
**Demographic** — `hh_number` (range 2.0–21.0), `hh_children` (range 0.0–14.0), `caregiver_knowage`, `caregiver_age` (range 12.0–52.0), `caregiver_age_est` and 15 others.
**Outcome / Measurement** — `income_source`, `income_similar`, `transport_cost`, `adm_zscore`, `dc_zscore` and 4 others.
**Identifier / Metadata** — `id` (mal1331, war8791, mal5323), `trans_id`, `esa_source`, `esa_processed`.
**Other** — `site` (Malualkon, Leith, Yargot), `protocol` (covid, standard), `program` (TSFP, OTP), `caregiver`, `education` and 112 others.
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-ssd-aah-final-dataset")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `id` | object | 0.0% | mal1331, war8791, mal5323 |
| `state` | object | 0.0% | NBeG |
| `site` | object | 0.0% | Malualkon, Leith, Yargot |
| `protocol` | object | 0.0% | covid, standard |
| `program` | object | 0.0% | TSFP, OTP |
| `studyarm` | object | 0.0% | tsfp-covid, tsfp-stand, otp-covid |
| `hh_sex` | object | 0.0% | female, male |
| `hh_primary` | object | 0.0% | yes, no |
| `hh_number` | float64 | 0.7% | 2.0 – 21.0 (mean 8.1698) |
| `hh_children` | float64 | 0.1% | 0.0 – 14.0 (mean 2.1038) |
| `hh_yrs` | object | 0.0% | +10 yrs, 1-5 yrs, 5-10 yrs |
| `displaced` | object | 0.0% | never, returnee, displaced, non-conflict |
| `ever_displaced` | object | 0.3% | |
| `currently_displ` | object | 0.3% | |
| `caregiver` | object | 0.0% | |
| `caregiver_relation` | object | 0.0% | |
| `caregiver_sex` | object | 0.0% | |
| `caregiver_knowage` | object | 0.0% | |
| `caregiver_age` | float64 | 39.0% | 12.0 – 52.0 (mean 28.8018) |
| `caregiver_age_est` | object | 74.3% | |
| `caregiver_age_cat` | object | 13.3% | |
| `education` | object | 79.4% | |
| `marital_status` | object | 0.0% | |
| `married` | object | 0.0% | |
| `pregnant` | object | 0.0% | |
| `breastfeeding` | object | 0.0% | |
| `hh_meals` | int64 | 0.0% | 0.0 – 5.0 (mean 1.5816) |
| `fs_enough` | object | 0.0% | |
| `fs_freq` | object | 0.0% | |
| `fs_sleep` | object | 0.0% | |
| `fs_sleep_freq` | object | 0.0% | |
| `fs_hungry` | object | 0.0% | |
| `fs_hungry_freq` | object | 0.0% | |
| `hhs_score` | float64 | 1.7% | 0.0 – 6.0 (mean 2.4725) |
| `hhs_cat` | object | 1.7% | |
| `assistance` | object | 0.0% | |
| `assist_amt` | int64 | 0.0% | 0.0 – 5.0 (mean 0.1416) |
| `exp_housing` | float64 | 0.9% | 0.0 – 200000.0 (mean 2933.758) |
| `exp_fuel` | float64 | 0.3% | 0.0 – 141000.0 (mean 2084.0886) |
| `exp_food` | float64 | 9.8% | 0.0 – 300000.0 (mean 17844.85) |
| `exp_hhitems` | float64 | 6.3% | 0.0 – 120000.0 (mean 3936.6603) |
| `exp_trans` | float64 | 0.6% | 0.0 – 140000.0 (mean 2262.8131) |
| `exp_healthcare` | float64 | 3.8% | 0.0 – 500000.0 (mean 8320.9916) |
| `exp_education` | float64 | 1.4% | 0.0 – 900000.0 (mean 5606.064) |
| `exp_debt` | float64 | 1.3% | 0.0 – 1200002.0 (mean 5964.0219) |
| `exp_other` | float64 | 0.6% | 0.0 – 100000.0 (mean 346.3702) |
| `exp_total` | int64 | 0.0% | 0.0 – 1284152.0 (mean 46777.3598) |
| `exp_similar` | object | 10.7% | |
| `income_source` | object | 0.0% | |
| `income_monthly` | float64 | 36.2% | 0.0 – 120000.0 (mean 8013.7006) |
| `income_similar` | object | 0.0% | |
| `hh_savings` | float64 | 34.8% | 0.0 – 230000.0 (mean 1932.2268) |
| `hh_debt` | float64 | 21.0% | 0.0 – 500000.0 (mean 3649.9292) |
| `child_sex` | object | 0.0% | |
| `child_dob` | datetime64[ns] | 0.0% | |
| `child_age` | float64 | 0.0% | 6.0123 – 56.8378 (mean 19.321) |
| `child_age_cat` | object | 0.0% | |
| `breastfed_currently` | object | 0.0% | |
| `siblings` | object | 0.0% | |
| `prev_am` | object | 0.0% | |
| `prev_am_yr` | float64 | 55.0% | |
| `prev_treat` | object | 54.5% | |
| `prev_treat_local` | object | 56.7% | |
| `treatment_distance` | float64 | 0.0% | |
| `hours` | float64 | 0.0% | |
| `minutes` | int64 | 0.0% | |
| `tot_minutes` | int64 | 0.0% | |
| `transport_cost` | float64 | 1.2% | |
| `child_diag` | object | 54.5% | |
| `child_diag_freq` | float64 | 59.9% | |
| `adm_criteria` | object | 0.0% | |
| `enrollment_date` | datetime64[ns] | 0.0% | |
| `adm_date` | datetime64[ns] | 0.0% | |
| `adm_oedema` | object | 0.0% | |
| `adm_muac` | float64 | 0.0% | |
| `adm_muac_cat` | object | 0.0% | |
| `adm_weight` | float64 | 53.6% | |
| `adm_hl` | float64 | 53.6% | |
| `adm_zscore` | object | 53.6% | |
| `adm_whz` | float64 | 53.6% | |
| `adm_whz_cat` | object | 53.6% | |
| `adm_waz` | float64 | 53.6% | |
| `adm_haz` | float64 | 53.6% | |
| `adm_bmiz` | float64 | 53.6% | |
| `adm_muacz` | float64 | 0.1% | |
| `dc_date` | datetime64[ns] | 0.0% | |
| `dc_age` | float64 | 0.0% | |
| `dc_oedema` | object | 0.0% | |
| `dc_muac` | float64 | 0.0% | |
| `dc_muac_cat` | object | 0.0% | |
| `dc_weight` | float64 | 53.6% | |
| `dc_hl` | float64 | 53.6% | |
| `dc_zscore` | object | 53.6% | |
| `dc_whz` | float64 | 53.6% | |
| `dc_whz_cat` | object | 53.6% | |
| `dc_waz` | float64 | 53.6% | |
| `dc_haz` | float64 | 53.6% | |
| `dc_bmiz` | float64 | 53.6% | |
| `dc_muacz` | float64 | 0.0% | |
| `outcome` | object | 0.0% | |
| `cured_binary` | object | 0.0% | |
| `los` | int64 | 0.0% | |
| `days_recovery` | float64 | 13.3% | |
| `weight_change` | float64 | 53.6% | |
| `weight_gain` | float64 | 53.6% | |
| `muac_change` | float64 | 0.0% | |
| `muac_gain` | float64 | 0.0% | |
| `tsfp_eligible` | object | 58.1% | |
| `trnsf_success` | object | 63.7% | |
| `trans_id` | object | 76.8% | |
| `trans_adm_age` | float64 | 76.8% | |
| `trans_adm_date` | datetime64[ns] | 76.8% | |
| `trans_adm_oedema` | object | 76.7% | |
| `trans_adm_muac` | float64 | 76.8% | |
| `trans_dc_date` | datetime64[ns] | 76.8% | |
| `trans_dc_age` | float64 | 76.8% | |
| `trans_dc_oedema` | object | 76.7% | |
| `trans_dc_muac` | float64 | 76.8% | |
| `trans_outcome` | object | 76.9% | |
| `trans_cured_binary` | object | 62.0% | |
| `trans_los` | float64 | 76.8% | |
| `trans_muac_change` | float64 | 76.8% | |
| `trans_muac_gain` | float64 | 76.8% | |
| `mam_adm_date` | float64 | 18.8% | |
| `mam_adm_age` | float64 | 18.8% | |
| `otp_trans` | object | 18.8% | |
| `mam_adm_oedema` | float64 | 18.8% | |
| `mam_adm_muac` | float64 | 18.8% | |
| `mam_adm_muac_cat` | object | 18.8% | |
| `mam_adm_weight` | float64 | 61.4% | |
| `mam_adm_hl` | float64 | 61.3% | |
| `mam_adm_zscore` | float64 | 61.3% | |
| `mam_adm_whz` | float64 | 61.4% | |
| `mam_adm_whz_cat` | object | 61.4% | |
| `mam_adm_waz` | float64 | 61.4% | |
| `mam_adm_haz` | float64 | 61.3% | |
| `mam_adm_bmiz` | float64 | 61.4% | |
| `mam_adm_muacz` | float64 | 42.0% | |
| `mam_dc_date` | float64 | 18.8% | |
| `mam_dc_age` | float64 | 18.8% | |
| `mam_dc_oedema` | float64 | 18.8% | |
| `mam_dc_muac` | float64 | 18.8% | |
| `mam_dc_muac_cat` | object | 18.8% | |
| `mam_dc_weight` | float64 | 61.3% | |
| `mam_dc_hl` | float64 | 61.3% | |
| `mam_dc_zscore` | float64 | 61.3% | |
| `mam_dc_whz` | float64 | 61.4% | |
| `mam_dc_whz_cat` | object | 61.4% | |
| `mam_dc_waz` | float64 | 61.4% | |
| `mam_dc_haz` | float64 | 61.4% | |
| `mam_dc_bmiz` | float64 | 61.4% | |
| `mam_dc_muacz` | float64 | 41.9% | |
| `mam_outcome` | object | 18.8% | |
| `mam_cured_binary` | object | 18.8% | |
| `mam_los` | float64 | 18.8% | |
| `mam_days_recovery` | float64 | 38.1% | |
| `mam_weight_change` | float64 | 61.4% | |
| `mam_weight_gain` | float64 | 61.4% | |
| `mam_muac_change` | float64 | 18.8% | |
| `mam_muac_gain` | float64 | 18.8% | |
| `sam_adm_date` | float64 | 58.1% | |
| `sam_adm_age` | float64 | 58.1% | |
| `sam_adm_oedema` | float64 | 58.1% | |
| `sam_adm_muac` | float64 | 58.1% | |
| `sam_adm_weight` | float64 | 79.6% | |
| `sam_adm_hl` | float64 | 79.6% | |
| `sam_adm_zscore` | float64 | 79.6% | |
| `sam_adm_whz` | float64 | 79.6% | |
| `sam_adm_waz` | float64 | 79.6% | |
| `sam_adm_haz` | float64 | 79.6% | |
| `sam_adm_bmiz` | float64 | 79.6% | |
| `sam_adm_muacz` | float64 | 58.1% | |
| `sam_dc_date` | float64 | 58.1% | |
| `sam_dc_age` | float64 | 58.1% | |
| `sam_dc_oedema` | float64 | 58.1% | |
| `sam_dc_muac` | float64 | 58.1% | |
| `sam_dc_weight` | float64 | 79.6% | |
| `sam_dc_hl` | float64 | 79.6% | |
| `sam_dc_zscore` | float64 | 79.6% | |
| `sam_dc_whz` | float64 | 79.7% | |
| `sam_dc_waz` | float64 | 79.7% | |
| `sam_dc_haz` | float64 | 79.7% | |
| `sam_dc_bmiz` | float64 | 79.7% | |
| `sam_outcome` | object | 58.1% | |
| `sam_cured_binary` | object | 58.1% | |
| `sam_los` | float64 | 58.1% | |
| `sam_weight_change` | float64 | 79.6% | |
| `sam_weight_gain` | float64 | 79.6% | |
| `sam_muac_change` | float64 | 58.1% | |
| `sam_muac_gain` | float64 | 58.1% | |
| `esa_source` | object | 0.0% | |
| `esa_processed` | object | 0.0% | |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `hh_number` | 2.0 | 21.0 | 8.1698 | 8.0 |
| `hh_children` | 0.0 | 14.0 | 2.1038 | 2.0 |
| `caregiver_age` | 12.0 | 52.0 | 28.8018 | 29.0 |
| `hh_meals` | 0.0 | 5.0 | 1.5816 | 2.0 |
| `hhs_score` | 0.0 | 6.0 | 2.4725 | 3.0 |
| `assist_amt` | 0.0 | 5.0 | 0.1416 | 0.0 |
| `exp_housing` | 0.0 | 200000.0 | 2933.758 | 0.0 |
| `exp_fuel` | 0.0 | 141000.0 | 2084.0886 | 0.0 |
| `exp_food` | 0.0 | 300000.0 | 17844.85 | 6000.0 |
| `exp_hhitems` | 0.0 | 120000.0 | 3936.6603 | 0.0 |
| `exp_trans` | 0.0 | 140000.0 | 2262.8131 | 0.0 |
| `exp_healthcare` | 0.0 | 500000.0 | 8320.9916 | 2000.0 |
| `exp_education` | 0.0 | 900000.0 | 5606.064 | 0.0 |
| `exp_debt` | 0.0 | 1200002.0 | 5964.0219 | 0.0 |
| `exp_other` | 0.0 | 100000.0 | 346.3702 | 0.0 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 31 column(s) with >80% missing values were removed: `hh_education`, `assist_freq`, `assist_food`, `assist_wash`, `assist_liveli`, `assist_voucher`.... 3 column(s) were cast from string to numeric or datetime based on parse-success rate (>85% threshold). The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from Johns Hopkins School of Public Health and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- The following columns have >20% missing values and should be treated with caution in modelling: `caregiver_age`, `caregiver_age_est`, `education`, `income_monthly`, `hh_savings`, `hh_debt`, `prev_am_yr`, `prev_treat`....
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/ssd-aah-final-dataset) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_ssd_aah_final_dataset,
title = {A Prospective Evaluation of Nutrition Protocol Adaptations in the Context of the COVID-19 Pandemic in South Sudan},
author = {Johns Hopkins School of Public Health},
year = {2025},
url = {https://data.humdata.org/dataset/ssd-aah-final-dataset},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica



