five

electricsheepafrica/africa-ssd-aah-final-dataset

收藏
Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-ssd-aah-final-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - no-annotation language_creators: - found language: - en license: cc-by-4.0 multilinguality: - monolingual size_categories: - 1K<n<10K source_datasets: - original task_categories: - tabular-classification - tabular-regression task_ids: [] tags: - africa - humanitarian - hdx - electric-sheep-africa - ssd pretty_name: "A Prospective Evaluation of Nutrition Protocol Adaptations in the Context of the COVID-19 Pandemic in South Sudan" dataset_info: splits: - name: train num_examples: 887 - name: test num_examples: 221 --- # A Prospective Evaluation of Nutrition Protocol Adaptations in the Context of the COVID-19 Pandemic in South Sudan **Publisher:** Johns Hopkins School of Public Health · **Source:** [HDX](https://data.humdata.org/dataset/ssd-aah-final-dataset) · **License:** `cc-by` · **Updated:** 2025-04-15 --- ## Abstract This is the underlying data for a manuscript pending publication. The manuscript reports results from a non-randomized prospective cohort study that compared outcomes of acutely malnourished children treated under South Sudan’s standard national CMAM protocol vs children treated via the COVID-modified CMAM protocol in terms of standard nutrition program indicators (i.e., recovery rates and length of stay). Each row in this dataset represents first-level administrative unit observations. Temporal coverage is indicated by the `child_dob`, `enrollment_date` column(s). Geographic scope: **SSD**. *Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).* --- ## Dataset Characteristics | | | |---|---| | **Domain** | Food security and nutrition | | **Unit of observation** | First-level administrative unit observations | | **Rows (total)** | 1,109 | | **Columns** | 192 (115 numeric, 71 categorical, 6 datetime) | | **Train split** | 887 rows | | **Test split** | 221 rows | | **Geographic scope** | SSD | | **Publisher** | Johns Hopkins School of Public Health | | **HDX last updated** | 2025-04-15 | --- ## Variables **Geographic** — `state` (NBeG), `studyarm` (tsfp-covid, tsfp-stand, otp-covid), `hh_sex` (female, male), `hh_primary` (yes, no), `hh_yrs` (+10 yrs, 1-5 yrs, 5-10 yrs) and 28 others. **Temporal** — `enrollment_date`, `adm_date`, `dc_date`, `trans_adm_date`, `trans_dc_date` and 4 others. **Demographic** — `hh_number` (range 2.0–21.0), `hh_children` (range 0.0–14.0), `caregiver_knowage`, `caregiver_age` (range 12.0–52.0), `caregiver_age_est` and 15 others. **Outcome / Measurement** — `income_source`, `income_similar`, `transport_cost`, `adm_zscore`, `dc_zscore` and 4 others. **Identifier / Metadata** — `id` (mal1331, war8791, mal5323), `trans_id`, `esa_source`, `esa_processed`. **Other** — `site` (Malualkon, Leith, Yargot), `protocol` (covid, standard), `program` (TSFP, OTP), `caregiver`, `education` and 112 others. --- ## Quick Start ```python from datasets import load_dataset ds = load_dataset("electricsheepafrica/africa-ssd-aah-final-dataset") train = ds["train"].to_pandas() test = ds["test"].to_pandas() print(train.shape) train.head() ``` --- ## Schema | Column | Type | Null % | Range / Sample Values | |---|---|---|---| | `id` | object | 0.0% | mal1331, war8791, mal5323 | | `state` | object | 0.0% | NBeG | | `site` | object | 0.0% | Malualkon, Leith, Yargot | | `protocol` | object | 0.0% | covid, standard | | `program` | object | 0.0% | TSFP, OTP | | `studyarm` | object | 0.0% | tsfp-covid, tsfp-stand, otp-covid | | `hh_sex` | object | 0.0% | female, male | | `hh_primary` | object | 0.0% | yes, no | | `hh_number` | float64 | 0.7% | 2.0 – 21.0 (mean 8.1698) | | `hh_children` | float64 | 0.1% | 0.0 – 14.0 (mean 2.1038) | | `hh_yrs` | object | 0.0% | +10 yrs, 1-5 yrs, 5-10 yrs | | `displaced` | object | 0.0% | never, returnee, displaced, non-conflict | | `ever_displaced` | object | 0.3% | | | `currently_displ` | object | 0.3% | | | `caregiver` | object | 0.0% | | | `caregiver_relation` | object | 0.0% | | | `caregiver_sex` | object | 0.0% | | | `caregiver_knowage` | object | 0.0% | | | `caregiver_age` | float64 | 39.0% | 12.0 – 52.0 (mean 28.8018) | | `caregiver_age_est` | object | 74.3% | | | `caregiver_age_cat` | object | 13.3% | | | `education` | object | 79.4% | | | `marital_status` | object | 0.0% | | | `married` | object | 0.0% | | | `pregnant` | object | 0.0% | | | `breastfeeding` | object | 0.0% | | | `hh_meals` | int64 | 0.0% | 0.0 – 5.0 (mean 1.5816) | | `fs_enough` | object | 0.0% | | | `fs_freq` | object | 0.0% | | | `fs_sleep` | object | 0.0% | | | `fs_sleep_freq` | object | 0.0% | | | `fs_hungry` | object | 0.0% | | | `fs_hungry_freq` | object | 0.0% | | | `hhs_score` | float64 | 1.7% | 0.0 – 6.0 (mean 2.4725) | | `hhs_cat` | object | 1.7% | | | `assistance` | object | 0.0% | | | `assist_amt` | int64 | 0.0% | 0.0 – 5.0 (mean 0.1416) | | `exp_housing` | float64 | 0.9% | 0.0 – 200000.0 (mean 2933.758) | | `exp_fuel` | float64 | 0.3% | 0.0 – 141000.0 (mean 2084.0886) | | `exp_food` | float64 | 9.8% | 0.0 – 300000.0 (mean 17844.85) | | `exp_hhitems` | float64 | 6.3% | 0.0 – 120000.0 (mean 3936.6603) | | `exp_trans` | float64 | 0.6% | 0.0 – 140000.0 (mean 2262.8131) | | `exp_healthcare` | float64 | 3.8% | 0.0 – 500000.0 (mean 8320.9916) | | `exp_education` | float64 | 1.4% | 0.0 – 900000.0 (mean 5606.064) | | `exp_debt` | float64 | 1.3% | 0.0 – 1200002.0 (mean 5964.0219) | | `exp_other` | float64 | 0.6% | 0.0 – 100000.0 (mean 346.3702) | | `exp_total` | int64 | 0.0% | 0.0 – 1284152.0 (mean 46777.3598) | | `exp_similar` | object | 10.7% | | | `income_source` | object | 0.0% | | | `income_monthly` | float64 | 36.2% | 0.0 – 120000.0 (mean 8013.7006) | | `income_similar` | object | 0.0% | | | `hh_savings` | float64 | 34.8% | 0.0 – 230000.0 (mean 1932.2268) | | `hh_debt` | float64 | 21.0% | 0.0 – 500000.0 (mean 3649.9292) | | `child_sex` | object | 0.0% | | | `child_dob` | datetime64[ns] | 0.0% | | | `child_age` | float64 | 0.0% | 6.0123 – 56.8378 (mean 19.321) | | `child_age_cat` | object | 0.0% | | | `breastfed_currently` | object | 0.0% | | | `siblings` | object | 0.0% | | | `prev_am` | object | 0.0% | | | `prev_am_yr` | float64 | 55.0% | | | `prev_treat` | object | 54.5% | | | `prev_treat_local` | object | 56.7% | | | `treatment_distance` | float64 | 0.0% | | | `hours` | float64 | 0.0% | | | `minutes` | int64 | 0.0% | | | `tot_minutes` | int64 | 0.0% | | | `transport_cost` | float64 | 1.2% | | | `child_diag` | object | 54.5% | | | `child_diag_freq` | float64 | 59.9% | | | `adm_criteria` | object | 0.0% | | | `enrollment_date` | datetime64[ns] | 0.0% | | | `adm_date` | datetime64[ns] | 0.0% | | | `adm_oedema` | object | 0.0% | | | `adm_muac` | float64 | 0.0% | | | `adm_muac_cat` | object | 0.0% | | | `adm_weight` | float64 | 53.6% | | | `adm_hl` | float64 | 53.6% | | | `adm_zscore` | object | 53.6% | | | `adm_whz` | float64 | 53.6% | | | `adm_whz_cat` | object | 53.6% | | | `adm_waz` | float64 | 53.6% | | | `adm_haz` | float64 | 53.6% | | | `adm_bmiz` | float64 | 53.6% | | | `adm_muacz` | float64 | 0.1% | | | `dc_date` | datetime64[ns] | 0.0% | | | `dc_age` | float64 | 0.0% | | | `dc_oedema` | object | 0.0% | | | `dc_muac` | float64 | 0.0% | | | `dc_muac_cat` | object | 0.0% | | | `dc_weight` | float64 | 53.6% | | | `dc_hl` | float64 | 53.6% | | | `dc_zscore` | object | 53.6% | | | `dc_whz` | float64 | 53.6% | | | `dc_whz_cat` | object | 53.6% | | | `dc_waz` | float64 | 53.6% | | | `dc_haz` | float64 | 53.6% | | | `dc_bmiz` | float64 | 53.6% | | | `dc_muacz` | float64 | 0.0% | | | `outcome` | object | 0.0% | | | `cured_binary` | object | 0.0% | | | `los` | int64 | 0.0% | | | `days_recovery` | float64 | 13.3% | | | `weight_change` | float64 | 53.6% | | | `weight_gain` | float64 | 53.6% | | | `muac_change` | float64 | 0.0% | | | `muac_gain` | float64 | 0.0% | | | `tsfp_eligible` | object | 58.1% | | | `trnsf_success` | object | 63.7% | | | `trans_id` | object | 76.8% | | | `trans_adm_age` | float64 | 76.8% | | | `trans_adm_date` | datetime64[ns] | 76.8% | | | `trans_adm_oedema` | object | 76.7% | | | `trans_adm_muac` | float64 | 76.8% | | | `trans_dc_date` | datetime64[ns] | 76.8% | | | `trans_dc_age` | float64 | 76.8% | | | `trans_dc_oedema` | object | 76.7% | | | `trans_dc_muac` | float64 | 76.8% | | | `trans_outcome` | object | 76.9% | | | `trans_cured_binary` | object | 62.0% | | | `trans_los` | float64 | 76.8% | | | `trans_muac_change` | float64 | 76.8% | | | `trans_muac_gain` | float64 | 76.8% | | | `mam_adm_date` | float64 | 18.8% | | | `mam_adm_age` | float64 | 18.8% | | | `otp_trans` | object | 18.8% | | | `mam_adm_oedema` | float64 | 18.8% | | | `mam_adm_muac` | float64 | 18.8% | | | `mam_adm_muac_cat` | object | 18.8% | | | `mam_adm_weight` | float64 | 61.4% | | | `mam_adm_hl` | float64 | 61.3% | | | `mam_adm_zscore` | float64 | 61.3% | | | `mam_adm_whz` | float64 | 61.4% | | | `mam_adm_whz_cat` | object | 61.4% | | | `mam_adm_waz` | float64 | 61.4% | | | `mam_adm_haz` | float64 | 61.3% | | | `mam_adm_bmiz` | float64 | 61.4% | | | `mam_adm_muacz` | float64 | 42.0% | | | `mam_dc_date` | float64 | 18.8% | | | `mam_dc_age` | float64 | 18.8% | | | `mam_dc_oedema` | float64 | 18.8% | | | `mam_dc_muac` | float64 | 18.8% | | | `mam_dc_muac_cat` | object | 18.8% | | | `mam_dc_weight` | float64 | 61.3% | | | `mam_dc_hl` | float64 | 61.3% | | | `mam_dc_zscore` | float64 | 61.3% | | | `mam_dc_whz` | float64 | 61.4% | | | `mam_dc_whz_cat` | object | 61.4% | | | `mam_dc_waz` | float64 | 61.4% | | | `mam_dc_haz` | float64 | 61.4% | | | `mam_dc_bmiz` | float64 | 61.4% | | | `mam_dc_muacz` | float64 | 41.9% | | | `mam_outcome` | object | 18.8% | | | `mam_cured_binary` | object | 18.8% | | | `mam_los` | float64 | 18.8% | | | `mam_days_recovery` | float64 | 38.1% | | | `mam_weight_change` | float64 | 61.4% | | | `mam_weight_gain` | float64 | 61.4% | | | `mam_muac_change` | float64 | 18.8% | | | `mam_muac_gain` | float64 | 18.8% | | | `sam_adm_date` | float64 | 58.1% | | | `sam_adm_age` | float64 | 58.1% | | | `sam_adm_oedema` | float64 | 58.1% | | | `sam_adm_muac` | float64 | 58.1% | | | `sam_adm_weight` | float64 | 79.6% | | | `sam_adm_hl` | float64 | 79.6% | | | `sam_adm_zscore` | float64 | 79.6% | | | `sam_adm_whz` | float64 | 79.6% | | | `sam_adm_waz` | float64 | 79.6% | | | `sam_adm_haz` | float64 | 79.6% | | | `sam_adm_bmiz` | float64 | 79.6% | | | `sam_adm_muacz` | float64 | 58.1% | | | `sam_dc_date` | float64 | 58.1% | | | `sam_dc_age` | float64 | 58.1% | | | `sam_dc_oedema` | float64 | 58.1% | | | `sam_dc_muac` | float64 | 58.1% | | | `sam_dc_weight` | float64 | 79.6% | | | `sam_dc_hl` | float64 | 79.6% | | | `sam_dc_zscore` | float64 | 79.6% | | | `sam_dc_whz` | float64 | 79.7% | | | `sam_dc_waz` | float64 | 79.7% | | | `sam_dc_haz` | float64 | 79.7% | | | `sam_dc_bmiz` | float64 | 79.7% | | | `sam_outcome` | object | 58.1% | | | `sam_cured_binary` | object | 58.1% | | | `sam_los` | float64 | 58.1% | | | `sam_weight_change` | float64 | 79.6% | | | `sam_weight_gain` | float64 | 79.6% | | | `sam_muac_change` | float64 | 58.1% | | | `sam_muac_gain` | float64 | 58.1% | | | `esa_source` | object | 0.0% | | | `esa_processed` | object | 0.0% | | --- ## Numeric Summary | Column | Min | Max | Mean | Median | |---|---|---|---|---| | `hh_number` | 2.0 | 21.0 | 8.1698 | 8.0 | | `hh_children` | 0.0 | 14.0 | 2.1038 | 2.0 | | `caregiver_age` | 12.0 | 52.0 | 28.8018 | 29.0 | | `hh_meals` | 0.0 | 5.0 | 1.5816 | 2.0 | | `hhs_score` | 0.0 | 6.0 | 2.4725 | 3.0 | | `assist_amt` | 0.0 | 5.0 | 0.1416 | 0.0 | | `exp_housing` | 0.0 | 200000.0 | 2933.758 | 0.0 | | `exp_fuel` | 0.0 | 141000.0 | 2084.0886 | 0.0 | | `exp_food` | 0.0 | 300000.0 | 17844.85 | 6000.0 | | `exp_hhitems` | 0.0 | 120000.0 | 3936.6603 | 0.0 | | `exp_trans` | 0.0 | 140000.0 | 2262.8131 | 0.0 | | `exp_healthcare` | 0.0 | 500000.0 | 8320.9916 | 2000.0 | | `exp_education` | 0.0 | 900000.0 | 5606.064 | 0.0 | | `exp_debt` | 0.0 | 1200002.0 | 5964.0219 | 0.0 | | `exp_other` | 0.0 | 100000.0 | 346.3702 | 0.0 | --- ## Curation Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 31 column(s) with >80% missing values were removed: `hh_education`, `assist_freq`, `assist_food`, `assist_wash`, `assist_liveli`, `assist_voucher`.... 3 column(s) were cast from string to numeric or datetime based on parse-success rate (>85% threshold). The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet. --- ## Limitations - Data originates from Johns Hopkins School of Public Health and has not been independently validated by ESA. - Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection. - The following columns have >20% missing values and should be treated with caution in modelling: `caregiver_age`, `caregiver_age_est`, `education`, `income_monthly`, `hh_savings`, `hh_debt`, `prev_am_yr`, `prev_treat`.... - Refer to the [original HDX dataset page](https://data.humdata.org/dataset/ssd-aah-final-dataset) for the publisher's own methodology notes and caveats. --- ## Citation ```bibtex @dataset{hdx_africa_ssd_aah_final_dataset, title = {A Prospective Evaluation of Nutrition Protocol Adaptations in the Context of the COVID-19 Pandemic in South Sudan}, author = {Johns Hopkins School of Public Health}, year = {2025}, url = {https://data.humdata.org/dataset/ssd-aah-final-dataset}, note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)} } ``` --- *[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作