electricsheepafrica/africa-aid-flows-all
收藏Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-aid-flows-all
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: other
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- tabular-classification
- tabular-regression
- other
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- disease
- geodata
- west-africa
- gin
- lbr
- sle
pretty_name: "American Red Cross West Africa Project"
dataset_info:
splits:
- name: train
num_examples: 5760
- name: test
num_examples: 1440
---
# American Red Cross West Africa Project
**Publisher:** American Red Cross (inactive) · **Source:** [HDX](https://data.humdata.org/dataset/american-red-cross-west-africa-project) · **License:** `hdx-multi` · **Updated:** 2026-04-19
---
## Abstract
From February to October 2016, the American Red Cross and its local Red Cross partners completed an effort to extensively map areas within a 15-kilometer distance of the shared borders between Guinea, Liberia, and Sierra Leone.
The goal of this work was to create an open and comprehensive dataset of communities for West Africa and to ensure that decision makers, humanitarian workers, and community stakeholders are better aware of water, sanitation, health, and community resources before and during the next crisis.
To complete this mapping, the American Red Cross launched a mapping center in Guéckédou, Guinea, and used it as both a base of operations and a community engagement facility. Over 100 volunteers helped to complete a rapid assessment of the region, visiting over 7,000 communities by motorbike to complete a vulnerability survey with the village leader. Next, over 100 communities were selected for a round of detailed mapping, focusing on collecting the location and information about every water point, health facility and other community resource in the area. In addition, we led technical skills trainings and mapping events both in Guéckédou and across the region.
**ALL DATA EXCEPT FOR THE OpenStreetMap EXTRACTS ARE LICENSED AS CC-BY 4.0**
Each row in this dataset represents individual-level records. Temporal coverage is indicated by the `finalstart`, `finalend` column(s). Geographic scope: **GIN, LBR, SLE**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Public health |
| **Unit of observation** | Individual-level records |
| **Rows (total)** | 7,200 |
| **Columns** | 203 (158 numeric, 41 categorical, 3 datetime) |
| **Train split** | 5,760 rows |
| **Test split** | 1,440 rows |
| **Geographic scope** | GIN, LBR, SLE |
| **Publisher** | American Red Cross (inactive) |
| **HDX last updated** | 2026-04-19 |
---
## Variables
**Geographic** — `x` (range 1.0–7200.0), `hh_gps_latitude` (range 6.8446–10.292), `hh_gps_longitude` (range -13.3118–-8.1514), `country` (Guinea, Sierra Leone, Liberia), `primary` and 26 others.
**Temporal** — `water_time` (range 1.0–999999.0), `treat_time` (range -240.0–999999.0), `birth_time` (range -4.0–999999.0), `ch_or3_time` (range 1.0–999999.0), `ch_or1_time` and 4 others.
**Demographic** — `hh_gps_altitude` (range -1033.2–999995.0), `hh_gps_precision` (range 0.8–999995.0), `hhnum` (range 0.0–999999.0), `namevillage_0_altvillage`, `namevillage_1_altvillage` and 19 others.
**Identifier / Metadata** — `unnamed_0` (range 1.0–7200.0), `namevill` (gueckedou_ctre, macenta_ctre, forecariah_ctre), `instanceid` (uuid:853dbd86-4aae-4454-801d-9461bd6883dc, uuid:e1a42e8e-94b7-42a9-9f62-e45b0cbfb369, uuid:2aac218d-1f2e-4d10-9019-479b714b1a9a), `landslides`, `midwife` and 16 others.
**Other** — `market` (Koundou, Manjama, Buedu community market), `marketclean` (999995, Koundou, Gueckedou), `marketcomsum` (range 0.0–215.0), `marketmatch` (999995, uuid:8ce0ef6d-d843-4241-8ac1-4025ab550007, 999999), `drink_wat` (range 1.0–999999.0) and 113 others.
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-aid-flows-all")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `unnamed_0` | int64 | 0.0% | 1.0 – 7200.0 (mean 3600.5) |
| `x` | int64 | 0.0% | 1.0 – 7200.0 (mean 3600.5) |
| `hh_gps_latitude` | float64 | 0.0% | 6.8446 – 10.292 (mean 8.6357) |
| `hh_gps_longitude` | float64 | 0.0% | -13.3118 – -8.1514 (mean -10.8994) |
| `namevill` | object | 8.6% | gueckedou_ctre, macenta_ctre, forecariah_ctre |
| `market` | object | 12.8% | Koundou, Manjama, Buedu community market |
| `marketclean` | object | 0.0% | 999995, Koundou, Gueckedou |
| `marketcomsum` | int64 | 0.0% | 0.0 – 215.0 (mean 3.6601) |
| `marketmatch` | object | 0.0% | 999995, uuid:8ce0ef6d-d843-4241-8ac1-4025ab550007, 999999 |
| `instanceid` | object | 0.0% | uuid:853dbd86-4aae-4454-801d-9461bd6883dc, uuid:e1a42e8e-94b7-42a9-9f62-e45b0cbfb369, uuid:2aac218d-1f2e-4d10-9019-479b714b1a9a |
| `hh_gps_altitude` | float64 | 0.0% | -1033.2 – 999995.0 (mean 459.7153) |
| `hh_gps_precision` | float64 | 0.0% | 0.8 – 999995.0 (mean 144.5712) |
| `hhnum` | int64 | 0.0% | 0.0 – 999999.0 (mean 99201.3196) |
| `drink_wat` | int64 | 0.0% | 1.0 – 999999.0 (mean 96813.9822) |
| `water_time` | int64 | 0.0% | 1.0 – 999999.0 (mean 97006.8439) |
| `toilet` | int64 | 0.0% | 1.0 – 999999.0 (mean 100284.6225) |
| `floor` | int64 | 0.0% | 1.0 – 999999.0 (mean 100281.0728) |
| `treat_main` | int64 | 0.0% | 1.0 – 999999.0 (mean 94588.5533) |
| `treat_loc` | int64 | 0.0% | 1.0 – 999999.0 (mean 118889.9639) |
| `treat_time` | int64 | 0.0% | -240.0 – 999999.0 (mean 119338.9825) |
| `birth_main` | int64 | 0.0% | 1.0 – 999999.0 (mean 102089.5906) |
| `birth_loc` | int64 | 0.0% | 1.0 – 999999.0 (mean 127362.009) |
| `birth_time` | int64 | 0.0% | -4.0 – 999999.0 (mean 127679.8736) |
| `comm_health` | object | 18.4% | 95, 1, 3 |
| `ch_or3_num` | int64 | 0.0% | 0.0 – 999999.0 (mean 624898.955) |
| `ch_or3_time` | int64 | 0.0% | 1.0 – 999999.0 (mean 624859.2442) |
| `markout` | int64 | 0.0% | |
| `markcomm` | object | 23.4% | Guinea, Buedu town, Jojoima Town |
| `worship` | int64 | 0.0% | |
| `worshiploc` | object | 72.2% | 9995, Koundou centre, Gbentu |
| `ch_or1_num` | int64 | 0.0% | |
| `ch_or1_time` | int64 | 0.0% | |
| `ch_or4_num` | int64 | 0.0% | |
| `ch_or4_time` | int64 | 0.0% | |
| `ch_or2_num` | int64 | 0.0% | |
| `ch_or2_time` | int64 | 0.0% | |
| `mark_other` | object | 60.9% | Riz, Condiment, Palm oil and dry goods |
| `ebola` | int64 | 0.0% | |
| `finalstart` | datetime64[ns] | 52.5% | |
| `finalend` | datetime64[ns] | 52.5% | |
| `country` | object | 0.0% | Guinea, Sierra Leone, Liberia |
| `drought` | int64 | 0.0% | |
| `famine` | int64 | 0.0% | |
| `flooding` | int64 | 0.0% | |
| `landslides` | int64 | 0.0% | |
| `fire` | int64 | 0.0% | |
| `commebola` | int64 | 0.0% | |
| `disease` | int64 | 0.0% | |
| `distnone` | int64 | 0.0% | |
| `othdist` | int64 | 0.0% | |
| `hosgov` | int64 | 0.0% | |
| `hosfor` | int64 | 0.0% | |
| `hosngo` | int64 | 0.0% | |
| `hosunsp` | int64 | 0.0% | |
| `clingov` | int64 | 0.0% | |
| `clinfor` | int64 | 0.0% | |
| `clinngo` | int64 | 0.0% | |
| `clinunsp` | int64 | 0.0% | |
| `chw` | int64 | 0.0% | |
| `trad` | int64 | 0.0% | |
| `hospri` | int64 | 0.0% | |
| `midwife` | int64 | 0.0% | |
| `friends` | int64 | 0.0% | |
| `unsp` | int64 | 0.0% | |
| `healthnone` | int64 | 0.0% | |
| `healthref` | int64 | 0.0% | |
| `healthdk` | int64 | 0.0% | |
| `birhosgov` | int64 | 0.0% | |
| `birhosfor` | int64 | 0.0% | |
| `birhosngo` | int64 | 0.0% | |
| `birhosunsp` | int64 | 0.0% | |
| `birclingov` | int64 | 0.0% | |
| `birclinfor` | int64 | 0.0% | |
| `birclinngo` | int64 | 0.0% | |
| `birclinunsp` | int64 | 0.0% | |
| `birchw` | int64 | 0.0% | |
| `birtrad` | int64 | 0.0% | |
| `birhospri` | int64 | 0.0% | |
| `birmidwife` | int64 | 0.0% | |
| `birfriends` | int64 | 0.0% | |
| `birnone` | int64 | 0.0% | |
| `birref` | int64 | 0.0% | |
| `birdk` | int64 | 0.0% | |
| `lgmoh` | int64 | 0.0% | |
| `redcross` | int64 | 0.0% | |
| `chwcomm` | int64 | 0.0% | |
| `noworkers` | int64 | 0.0% | |
| `workoth` | int64 | 0.0% | |
| `workersref` | int64 | 0.0% | |
| `workersdk` | int64 | 0.0% | |
| `noschool` | int64 | 0.0% | |
| `primary` | int64 | 0.0% | |
| `secondary` | int64 | 0.0% | |
| `vocational` | int64 | 0.0% | |
| `university` | int64 | 0.0% | |
| `postgrad` | int64 | 0.0% | |
| `meat` | int64 | 0.0% | |
| `poultry` | int64 | 0.0% | |
| `fish` | int64 | 0.0% | |
| `fruit` | int64 | 0.0% | |
| `vegetables` | int64 | 0.0% | |
| `marketother` | int64 | 0.0% | |
| `monday` | int64 | 0.0% | |
| `tuesday` | int64 | 0.0% | |
| `wednesday` | int64 | 0.0% | |
| `thursday` | int64 | 0.0% | |
| `friday` | int64 | 0.0% | |
| `saturday` | int64 | 0.0% | |
| `sunday` | int64 | 0.0% | |
| `everyday` | int64 | 0.0% | |
| `dayref` | int64 | 0.0% | |
| `daydk` | int64 | 0.0% | |
| `whsum` | int64 | 0.0% | |
| `whmatch` | object | 2.5% | |
| `relclean` | object | 0.0% | |
| `treatmentsum` | int64 | 0.0% | |
| `treatmatch` | object | 2.4% | |
| `finalindex` | int64 | 0.0% | |
| `instancename` | object | 0.0% | |
| `formid` | object | 0.0% | |
| `deviceid` | object | 0.0% | |
| `submissiontime` | datetime64[ns, UTC] | 0.0% | |
| `namevillage_0_altvillage` | float64 | 52.5% | |
| `namevillage_1_altvillage` | float64 | 52.5% | |
| `namevillage_4_altvillage` | float64 | 52.5% | |
| `namevillage_6_altvillage` | float64 | 18.5% | |
| `loc_adm1` | object | 8.1% | |
| `loc_adm2` | object | 8.1% | |
| `urban_or_rural` | int64 | 0.0% | |
| `other_details` | object | 11.4% | |
| `comm_dist` | object | 10.6% | |
| `treat_mult` | float64 | 17.7% | |
| `birth_mult` | float64 | 20.3% | |
| `share_emerge_contact` | int64 | 0.0% | |
| `educ` | float64 | 14.4% | |
| `marktype` | object | 28.5% | |
| `markday` | float64 | 23.2% | |
| `comment` | object | 18.3% | |
| `border_crossing` | float64 | 2.5% | |
| `border_crossing_type` | float64 | 4.5% | |
| `border_crossing_name` | object | 60.0% | |
| `border_crossing_dest` | object | 60.3% | |
| `epidemic_specifics` | object | 75.5% | |
| `drink_wat_oth` | object | 78.3% | |
| `inhabited` | int64 | 0.0% | |
| `treat_loc_specify` | object | 21.9% | |
| `birth_loc_specify` | object | 28.5% | |
| `abancomm` | float64 | 69.1% | |
| `poste` | int64 | 0.0% | |
| `matron` | int64 | 0.0% | |
| `birposte` | int64 | 0.0% | |
| `birmatron` | int64 | 0.0% | |
| `schoolrefdk` | int64 | 0.0% | |
| `markettyperefdk` | int64 | 0.0% | |
| `markstall` | int64 | 0.0% | |
| `osmsurvey` | int64 | 0.0% | |
| `numhealth` | int64 | 0.0% | |
| `region` | float64 | 18.5% | |
| `schoolref` | int64 | 0.0% | |
| `schooldk` | int64 | 0.0% | |
| `loc_adm3` | object | 5.0% | |
| `loc_adm4` | object | 5.0% | |
| `district` | object | 8.5% | |
| `secteur_rural` | object | 8.8% | |
| `name_village` | object | 8.6% | |
| `name_part_village` | object | 12.7% | |
| `alt_secteur_rurale` | object | 46.4% | |
| `quartier_ou_district` | object | 44.2% | |
| `secteur_urbain` | object | 44.2% | |
| `name_carre` | object | 45.3% | |
| `names_part_village_alt_part_village` | float64 | 47.5% | |
| `names_part_village_0_alt_part_village` | float64 | 47.5% | |
| `names_part_village_1_alt_part_village` | float64 | 47.5% | |
| `names_part_village_2_alt_part_village` | float64 | 47.5% | |
| `names_part_village_3_alt_part_village` | float64 | 47.5% | |
| `altvillage` | float64 | 47.5% | |
| `version` | object | 0.0% | |
| `alts_secteur_rurale_0_alt_secteur_rurale` | float64 | 41.9% | |
| `alts_secteur_rurale_1_alt_secteur_rurale` | float64 | 41.9% | |
| `alts_village_0_altvillage` | float64 | 41.9% | |
| `alts_village_1_altvillage` | float64 | 41.9% | |
| `alts_village_2_altvillage` | float64 | 41.9% | |
| `names_part_village_4_alt_part_village` | float64 | 41.9% | |
| `names_part_village_5_alt_part_village` | float64 | 41.9% | |
| `names_part_village_6_alt_part_village` | float64 | 41.9% | |
| `alt_secteur_urbain` | float64 | 41.8% | |
| `date` | datetime64[ns] | 52.5% | |
| `prev_interview` | int64 | 0.0% | |
| `alts_village_0_altvillage_1` | float64 | 34.0% | |
| `alts_village_1_altvillage_1` | float64 | 33.2% | |
| `alts_village_2_altvillage_1` | float64 | 32.4% | |
| `constituency` | float64 | 4.1% | |
| `ward` | float64 | 3.3% | |
| `gsm_service` | int64 | 0.0% | |
| `gsm_distance` | float64 | 33.4% | |
| `adm4urb_def` | float64 | 34.0% | |
| `adm4urban_name` | float64 | 34.0% | |
| `adm5urban_name` | float64 | 34.0% | |
| `treathome` | int64 | 0.0% | |
| `birhome` | int64 | 0.0% | |
| `marketout` | int64 | 0.0% | |
| `esa_source` | object | 0.0% | |
| `esa_processed` | object | 0.0% | |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `unnamed_0` | 1.0 | 7200.0 | 3600.5 | 3600.5 |
| `x` | 1.0 | 7200.0 | 3600.5 | 3600.5 |
| `hh_gps_latitude` | 6.8446 | 10.292 | 8.6357 | 8.5551 |
| `hh_gps_longitude` | -13.3118 | -8.1514 | -10.8994 | -10.6041 |
| `marketcomsum` | 0.0 | 215.0 | 3.6601 | 0.0 |
| `hh_gps_altitude` | -1033.2 | 999995.0 | 459.7153 | 367.3 |
| `hh_gps_precision` | 0.8 | 999995.0 | 144.5712 | 6.2 |
| `hhnum` | 0.0 | 999999.0 | 99201.3196 | 30.0 |
| `drink_wat` | 1.0 | 999999.0 | 96813.9822 | 8.0 |
| `water_time` | 1.0 | 999999.0 | 97006.8439 | 25.0 |
| `toilet` | 1.0 | 999999.0 | 100284.6225 | 10.0 |
| `floor` | 1.0 | 999999.0 | 100281.0728 | 1.0 |
| `treat_main` | 1.0 | 999999.0 | 94588.5533 | 5.0 |
| `treat_loc` | 1.0 | 999999.0 | 118889.9639 | 2.0 |
| `treat_time` | -240.0 | 999999.0 | 119338.9825 | 120.0 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 6 column(s) with >80% missing values were removed: `comm_dist_oth`, `floor_oth`, `ch_mult_oth`, `relgo`, `religiousmatch`, `toilet_oth`. 40 column(s) were cast from string to numeric or datetime based on parse-success rate (>85% threshold). The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from American Red Cross (inactive) and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- The following columns have >20% missing values and should be treated with caution in modelling: `markcomm`, `worshiploc`, `mark_other`, `finalstart`, `finalend`, `namevillage_0_altvillage`, `namevillage_1_altvillage`, `namevillage_4_altvillage`....
- This dataset spans 3 countries; geographic and methodological inconsistencies across national boundaries may affect cross-country comparability.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/american-red-cross-west-africa-project) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_aid_flows_all,
title = {American Red Cross West Africa Project},
author = {American Red Cross (inactive)},
year = {2026},
url = {https://data.humdata.org/dataset/american-red-cross-west-africa-project},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica



