electricsheepafrica/africa-risk-factors-for-hospitalization-and-death-from-covid-19-in-humanitarian-settings
收藏Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-risk-factors-for-hospitalization-and-death-from-covid-19-in-humanitarian-settings
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: other
multilinguality:
- monolingual
size_categories:
- n<1K
source_datasets:
- original
task_categories:
- tabular-classification
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- cod
- ssd
pretty_name: "Risk Factors for Hospitalization and Death from COVID-19 in Humanitarian Settings"
dataset_info:
splits:
- name: train
num_examples: 415
- name: test
num_examples: 103
---
# Risk Factors for Hospitalization and Death from COVID-19 in Humanitarian Settings
**Publisher:** Johns Hopkins School of Public Health · **Source:** [HDX](https://data.humdata.org/dataset/risk-factors-for-hospitalization-and-death-from-covid-19-in-humanitarian-settings) · **License:** `other-pd-nr` · **Updated:** 2025-04-10
---
## Abstract
Deidentified dataset used for analysis presented in "Risk Factors for Hospitalization and Death from COVID-19: A Prospective Cohort Study in South Sudan and Eastern Democratic Republic of the Congo" by Leidman et al.
Each row in this dataset represents first-level administrative unit observations. Data was last updated on HDX on 2025-04-10. Geographic scope: **COD, SSD**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Humanitarian and development data |
| **Unit of observation** | First-level administrative unit observations |
| **Rows (total)** | 519 |
| **Columns** | 58 (3 numeric, 55 categorical, 0 datetime) |
| **Train split** | 415 rows |
| **Test split** | 103 rows |
| **Geographic scope** | COD, SSD |
| **Publisher** | Johns Hopkins School of Public Health |
| **HDX last updated** | 2025-04-10 |
---
## Variables
**Geographic** — `age_years` (range 0.0–84.0), `anemic_yn` (no, yes), `anyinfectious` (no, yes), `country_x` (DRC, SSD), `exposure_carecovidpatient` and 35 others.
**Demographic** — `age_categories` (18-44, 45-64, 65+).
**Outcome / Measurement** — `covidcasestatus_new` (confirmed (rtpcr), confirmed (antigen), Suspect- no valid test), `form_case_case_id`.
**Identifier / Metadata** — `unnamed_0` (range 1.0–519.0), `esa_source`, `esa_processed`.
**Other** — `anemia` (missing, non-anemic, mild), `bmi_adult` (range 15.0474–42.5605), `bmi_cat` (normal weight, overweight, obesity), `bmi_obese` (not obese, obese), `deceased` (no, deceased) and 7 others.
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-risk-factors-for-hospitalization-and-death-from-covid-19-in-humanitarian-settings")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `unnamed_0` | int64 | 0.0% | 1.0 – 519.0 (mean 260.0) |
| `age_categories` | object | 0.0% | 18-44, 45-64, 65+ |
| `age_years` | int64 | 0.0% | 0.0 – 84.0 (mean 40.6089) |
| `anemia` | object | 0.0% | missing, non-anemic, mild |
| `anemic_yn` | object | 59.0% | no, yes |
| `anyinfectious` | object | 43.0% | no, yes |
| `bmi_adult` | float64 | 13.5% | 15.0474 – 42.5605 (mean 25.9959) |
| `bmi_cat` | object | 9.4% | normal weight, overweight, obesity |
| `bmi_obese` | object | 0.0% | not obese, obese |
| `country_x` | object | 0.0% | DRC, SSD |
| `covidcasestatus_new` | object | 0.0% | confirmed (rtpcr), confirmed (antigen), Suspect- no valid test |
| `deceased` | object | 0.4% | no, deceased |
| `ever_hospitalized` | object | 0.0% | Never hospitalized (Outpatient managed), Ever hospitalized |
| `exposure_carecovidpatient` | object | 2.5% | |
| `exposure_contactcovidcase` | object | 49.5% | |
| `exposure_hcw` | object | 0.6% | |
| `exposure_visithcf` | object | 0.4% | |
| `exposure_workingoutsidehome` | object | 0.0% | |
| `fever` | object | 0.2% | |
| `highbloodpressure_enrollment_13080` | object | 2.1% | |
| `history_asthma` | object | 0.2% | |
| `history_cardiac` | object | 1.0% | |
| `history_chronic_cat` | object | 0.0% | |
| `history_diabetes` | object | 0.0% | |
| `history_hiv` | object | 43.9% | |
| `history_hypertension` | object | 0.2% | |
| `history_pulmonary` | object | 0.6% | |
| `history_tb` | object | 0.2% | |
| `hypothermia_enrollment` | object | 0.2% | |
| `form_case_case_id` | object | 0.0% | |
| `low_oxygen94_enrollment` | object | 7.3% | |
| `obs_appearance` | object | 0.0% | |
| `region_collapsed` | object | 0.6% | |
| `region_manuscript` | object | 0.6% | |
| `respiratorydistress` | object | 0.0% | |
| `sex` | object | 0.0% | |
| `smoke` | object | 0.8% | |
| `studysite_manuscript` | object | 0.0% | |
| `suspected_malaria` | object | 0.0% | |
| `symptoms_abdominalpain_x` | object | 0.0% | |
| `symptoms_any` | object | 0.0% | |
| `symptoms_appetite` | object | 0.0% | |
| `symptoms_chestpain_x` | object | 0.2% | |
| `symptoms_chills_x` | object | 0.0% | |
| `symptoms_cough_x` | object | 0.0% | |
| `symptoms_diarrhea_x` | object | 0.0% | |
| `symptoms_fatigue_x` | object | 0.0% | |
| `symptoms_headache_x` | object | 0.2% | |
| `symptoms_jointpain_x` | object | 0.2% | |
| `symptoms_nausea_x` | object | 0.0% | |
| `symptoms_runnynose_x` | object | 0.2% | |
| `symptoms_sob_x` | object | 0.0% | |
| `symptoms_sorethroat_x` | object | 0.2% | |
| `symptoms_tasteorsmell` | object | 0.8% | |
| `symptoms_wheezing_x` | object | 0.0% | |
| `test_reason` | object | 5.4% | |
| `esa_source` | object | 0.0% | |
| `esa_processed` | object | 0.0% | |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `unnamed_0` | 1.0 | 519.0 | 260.0 | 260.0 |
| `age_years` | 0.0 | 84.0 | 40.6089 | 39.0 |
| `bmi_adult` | 15.0474 | 42.5605 | 25.9959 | 25.5588 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 1 column(s) with >80% missing values were removed: `uncontrolled_diabetes8`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from Johns Hopkins School of Public Health and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- The following columns have >20% missing values and should be treated with caution in modelling: `anemic_yn`, `anyinfectious`, `exposure_contactcovidcase`, `history_hiv`.
- This dataset spans 2 countries; geographic and methodological inconsistencies across national boundaries may affect cross-country comparability.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/risk-factors-for-hospitalization-and-death-from-covid-19-in-humanitarian-settings) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_risk_factors_for_hospitalization_and_death_from_covid_19_in_humanitarian_settings,
title = {Risk Factors for Hospitalization and Death from COVID-19 in Humanitarian Settings},
author = {Johns Hopkins School of Public Health},
year = {2025},
url = {https://data.humdata.org/dataset/risk-factors-for-hospitalization-and-death-from-covid-19-in-humanitarian-settings},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica



