five

electricsheepafrica/africa-aid-flows-all

收藏
Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-aid-flows-all
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - no-annotation language_creators: - found language: - en license: other multilinguality: - monolingual size_categories: - 1K<n<10K source_datasets: - original task_categories: - tabular-classification - tabular-regression - other task_ids: [] tags: - africa - humanitarian - hdx - electric-sheep-africa - disease - geodata - west-africa - gin - lbr - sle pretty_name: "American Red Cross West Africa Project" dataset_info: splits: - name: train num_examples: 5760 - name: test num_examples: 1440 --- # American Red Cross West Africa Project **Publisher:** American Red Cross (inactive) · **Source:** [HDX](https://data.humdata.org/dataset/american-red-cross-west-africa-project) · **License:** `hdx-multi` · **Updated:** 2026-04-19 --- ## Abstract From February to October 2016, the American Red Cross and its local Red Cross partners completed an effort to extensively map areas within a 15-kilometer distance of the shared borders between Guinea, Liberia, and Sierra Leone. The goal of this work was to create an open and comprehensive dataset of communities for West Africa and to ensure that decision makers, humanitarian workers, and community stakeholders are better aware of water, sanitation, health, and community resources before and during the next crisis. To complete this mapping, the American Red Cross launched a mapping center in Guéckédou, Guinea, and used it as both a base of operations and a community engagement facility. Over 100 volunteers helped to complete a rapid assessment of the region, visiting over 7,000 communities by motorbike to complete a vulnerability survey with the village leader. Next, over 100 communities were selected for a round of detailed mapping, focusing on collecting the location and information about every water point, health facility and other community resource in the area. In addition, we led technical skills trainings and mapping events both in Guéckédou and across the region. **ALL DATA EXCEPT FOR THE OpenStreetMap EXTRACTS ARE LICENSED AS CC-BY 4.0** Each row in this dataset represents individual-level records. Temporal coverage is indicated by the `finalstart`, `finalend` column(s). Geographic scope: **GIN, LBR, SLE**. *Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).* --- ## Dataset Characteristics | | | |---|---| | **Domain** | Public health | | **Unit of observation** | Individual-level records | | **Rows (total)** | 7,200 | | **Columns** | 203 (158 numeric, 41 categorical, 3 datetime) | | **Train split** | 5,760 rows | | **Test split** | 1,440 rows | | **Geographic scope** | GIN, LBR, SLE | | **Publisher** | American Red Cross (inactive) | | **HDX last updated** | 2026-04-19 | --- ## Variables **Geographic** — `x` (range 1.0–7200.0), `hh_gps_latitude` (range 6.8446–10.292), `hh_gps_longitude` (range -13.3118–-8.1514), `country` (Guinea, Sierra Leone, Liberia), `primary` and 26 others. **Temporal** — `water_time` (range 1.0–999999.0), `treat_time` (range -240.0–999999.0), `birth_time` (range -4.0–999999.0), `ch_or3_time` (range 1.0–999999.0), `ch_or1_time` and 4 others. **Demographic** — `hh_gps_altitude` (range -1033.2–999995.0), `hh_gps_precision` (range 0.8–999995.0), `hhnum` (range 0.0–999999.0), `namevillage_0_altvillage`, `namevillage_1_altvillage` and 19 others. **Identifier / Metadata** — `unnamed_0` (range 1.0–7200.0), `namevill` (gueckedou_ctre, macenta_ctre, forecariah_ctre), `instanceid` (uuid:853dbd86-4aae-4454-801d-9461bd6883dc, uuid:e1a42e8e-94b7-42a9-9f62-e45b0cbfb369, uuid:2aac218d-1f2e-4d10-9019-479b714b1a9a), `landslides`, `midwife` and 16 others. **Other** — `market` (Koundou, Manjama, Buedu community market), `marketclean` (999995, Koundou, Gueckedou), `marketcomsum` (range 0.0–215.0), `marketmatch` (999995, uuid:8ce0ef6d-d843-4241-8ac1-4025ab550007, 999999), `drink_wat` (range 1.0–999999.0) and 113 others. --- ## Quick Start ```python from datasets import load_dataset ds = load_dataset("electricsheepafrica/africa-aid-flows-all") train = ds["train"].to_pandas() test = ds["test"].to_pandas() print(train.shape) train.head() ``` --- ## Schema | Column | Type | Null % | Range / Sample Values | |---|---|---|---| | `unnamed_0` | int64 | 0.0% | 1.0 – 7200.0 (mean 3600.5) | | `x` | int64 | 0.0% | 1.0 – 7200.0 (mean 3600.5) | | `hh_gps_latitude` | float64 | 0.0% | 6.8446 – 10.292 (mean 8.6357) | | `hh_gps_longitude` | float64 | 0.0% | -13.3118 – -8.1514 (mean -10.8994) | | `namevill` | object | 8.6% | gueckedou_ctre, macenta_ctre, forecariah_ctre | | `market` | object | 12.8% | Koundou, Manjama, Buedu community market | | `marketclean` | object | 0.0% | 999995, Koundou, Gueckedou | | `marketcomsum` | int64 | 0.0% | 0.0 – 215.0 (mean 3.6601) | | `marketmatch` | object | 0.0% | 999995, uuid:8ce0ef6d-d843-4241-8ac1-4025ab550007, 999999 | | `instanceid` | object | 0.0% | uuid:853dbd86-4aae-4454-801d-9461bd6883dc, uuid:e1a42e8e-94b7-42a9-9f62-e45b0cbfb369, uuid:2aac218d-1f2e-4d10-9019-479b714b1a9a | | `hh_gps_altitude` | float64 | 0.0% | -1033.2 – 999995.0 (mean 459.7153) | | `hh_gps_precision` | float64 | 0.0% | 0.8 – 999995.0 (mean 144.5712) | | `hhnum` | int64 | 0.0% | 0.0 – 999999.0 (mean 99201.3196) | | `drink_wat` | int64 | 0.0% | 1.0 – 999999.0 (mean 96813.9822) | | `water_time` | int64 | 0.0% | 1.0 – 999999.0 (mean 97006.8439) | | `toilet` | int64 | 0.0% | 1.0 – 999999.0 (mean 100284.6225) | | `floor` | int64 | 0.0% | 1.0 – 999999.0 (mean 100281.0728) | | `treat_main` | int64 | 0.0% | 1.0 – 999999.0 (mean 94588.5533) | | `treat_loc` | int64 | 0.0% | 1.0 – 999999.0 (mean 118889.9639) | | `treat_time` | int64 | 0.0% | -240.0 – 999999.0 (mean 119338.9825) | | `birth_main` | int64 | 0.0% | 1.0 – 999999.0 (mean 102089.5906) | | `birth_loc` | int64 | 0.0% | 1.0 – 999999.0 (mean 127362.009) | | `birth_time` | int64 | 0.0% | -4.0 – 999999.0 (mean 127679.8736) | | `comm_health` | object | 18.4% | 95, 1, 3 | | `ch_or3_num` | int64 | 0.0% | 0.0 – 999999.0 (mean 624898.955) | | `ch_or3_time` | int64 | 0.0% | 1.0 – 999999.0 (mean 624859.2442) | | `markout` | int64 | 0.0% | | | `markcomm` | object | 23.4% | Guinea, Buedu town, Jojoima Town | | `worship` | int64 | 0.0% | | | `worshiploc` | object | 72.2% | 9995, Koundou centre, Gbentu | | `ch_or1_num` | int64 | 0.0% | | | `ch_or1_time` | int64 | 0.0% | | | `ch_or4_num` | int64 | 0.0% | | | `ch_or4_time` | int64 | 0.0% | | | `ch_or2_num` | int64 | 0.0% | | | `ch_or2_time` | int64 | 0.0% | | | `mark_other` | object | 60.9% | Riz, Condiment, Palm oil and dry goods | | `ebola` | int64 | 0.0% | | | `finalstart` | datetime64[ns] | 52.5% | | | `finalend` | datetime64[ns] | 52.5% | | | `country` | object | 0.0% | Guinea, Sierra Leone, Liberia | | `drought` | int64 | 0.0% | | | `famine` | int64 | 0.0% | | | `flooding` | int64 | 0.0% | | | `landslides` | int64 | 0.0% | | | `fire` | int64 | 0.0% | | | `commebola` | int64 | 0.0% | | | `disease` | int64 | 0.0% | | | `distnone` | int64 | 0.0% | | | `othdist` | int64 | 0.0% | | | `hosgov` | int64 | 0.0% | | | `hosfor` | int64 | 0.0% | | | `hosngo` | int64 | 0.0% | | | `hosunsp` | int64 | 0.0% | | | `clingov` | int64 | 0.0% | | | `clinfor` | int64 | 0.0% | | | `clinngo` | int64 | 0.0% | | | `clinunsp` | int64 | 0.0% | | | `chw` | int64 | 0.0% | | | `trad` | int64 | 0.0% | | | `hospri` | int64 | 0.0% | | | `midwife` | int64 | 0.0% | | | `friends` | int64 | 0.0% | | | `unsp` | int64 | 0.0% | | | `healthnone` | int64 | 0.0% | | | `healthref` | int64 | 0.0% | | | `healthdk` | int64 | 0.0% | | | `birhosgov` | int64 | 0.0% | | | `birhosfor` | int64 | 0.0% | | | `birhosngo` | int64 | 0.0% | | | `birhosunsp` | int64 | 0.0% | | | `birclingov` | int64 | 0.0% | | | `birclinfor` | int64 | 0.0% | | | `birclinngo` | int64 | 0.0% | | | `birclinunsp` | int64 | 0.0% | | | `birchw` | int64 | 0.0% | | | `birtrad` | int64 | 0.0% | | | `birhospri` | int64 | 0.0% | | | `birmidwife` | int64 | 0.0% | | | `birfriends` | int64 | 0.0% | | | `birnone` | int64 | 0.0% | | | `birref` | int64 | 0.0% | | | `birdk` | int64 | 0.0% | | | `lgmoh` | int64 | 0.0% | | | `redcross` | int64 | 0.0% | | | `chwcomm` | int64 | 0.0% | | | `noworkers` | int64 | 0.0% | | | `workoth` | int64 | 0.0% | | | `workersref` | int64 | 0.0% | | | `workersdk` | int64 | 0.0% | | | `noschool` | int64 | 0.0% | | | `primary` | int64 | 0.0% | | | `secondary` | int64 | 0.0% | | | `vocational` | int64 | 0.0% | | | `university` | int64 | 0.0% | | | `postgrad` | int64 | 0.0% | | | `meat` | int64 | 0.0% | | | `poultry` | int64 | 0.0% | | | `fish` | int64 | 0.0% | | | `fruit` | int64 | 0.0% | | | `vegetables` | int64 | 0.0% | | | `marketother` | int64 | 0.0% | | | `monday` | int64 | 0.0% | | | `tuesday` | int64 | 0.0% | | | `wednesday` | int64 | 0.0% | | | `thursday` | int64 | 0.0% | | | `friday` | int64 | 0.0% | | | `saturday` | int64 | 0.0% | | | `sunday` | int64 | 0.0% | | | `everyday` | int64 | 0.0% | | | `dayref` | int64 | 0.0% | | | `daydk` | int64 | 0.0% | | | `whsum` | int64 | 0.0% | | | `whmatch` | object | 2.5% | | | `relclean` | object | 0.0% | | | `treatmentsum` | int64 | 0.0% | | | `treatmatch` | object | 2.4% | | | `finalindex` | int64 | 0.0% | | | `instancename` | object | 0.0% | | | `formid` | object | 0.0% | | | `deviceid` | object | 0.0% | | | `submissiontime` | datetime64[ns, UTC] | 0.0% | | | `namevillage_0_altvillage` | float64 | 52.5% | | | `namevillage_1_altvillage` | float64 | 52.5% | | | `namevillage_4_altvillage` | float64 | 52.5% | | | `namevillage_6_altvillage` | float64 | 18.5% | | | `loc_adm1` | object | 8.1% | | | `loc_adm2` | object | 8.1% | | | `urban_or_rural` | int64 | 0.0% | | | `other_details` | object | 11.4% | | | `comm_dist` | object | 10.6% | | | `treat_mult` | float64 | 17.7% | | | `birth_mult` | float64 | 20.3% | | | `share_emerge_contact` | int64 | 0.0% | | | `educ` | float64 | 14.4% | | | `marktype` | object | 28.5% | | | `markday` | float64 | 23.2% | | | `comment` | object | 18.3% | | | `border_crossing` | float64 | 2.5% | | | `border_crossing_type` | float64 | 4.5% | | | `border_crossing_name` | object | 60.0% | | | `border_crossing_dest` | object | 60.3% | | | `epidemic_specifics` | object | 75.5% | | | `drink_wat_oth` | object | 78.3% | | | `inhabited` | int64 | 0.0% | | | `treat_loc_specify` | object | 21.9% | | | `birth_loc_specify` | object | 28.5% | | | `abancomm` | float64 | 69.1% | | | `poste` | int64 | 0.0% | | | `matron` | int64 | 0.0% | | | `birposte` | int64 | 0.0% | | | `birmatron` | int64 | 0.0% | | | `schoolrefdk` | int64 | 0.0% | | | `markettyperefdk` | int64 | 0.0% | | | `markstall` | int64 | 0.0% | | | `osmsurvey` | int64 | 0.0% | | | `numhealth` | int64 | 0.0% | | | `region` | float64 | 18.5% | | | `schoolref` | int64 | 0.0% | | | `schooldk` | int64 | 0.0% | | | `loc_adm3` | object | 5.0% | | | `loc_adm4` | object | 5.0% | | | `district` | object | 8.5% | | | `secteur_rural` | object | 8.8% | | | `name_village` | object | 8.6% | | | `name_part_village` | object | 12.7% | | | `alt_secteur_rurale` | object | 46.4% | | | `quartier_ou_district` | object | 44.2% | | | `secteur_urbain` | object | 44.2% | | | `name_carre` | object | 45.3% | | | `names_part_village_alt_part_village` | float64 | 47.5% | | | `names_part_village_0_alt_part_village` | float64 | 47.5% | | | `names_part_village_1_alt_part_village` | float64 | 47.5% | | | `names_part_village_2_alt_part_village` | float64 | 47.5% | | | `names_part_village_3_alt_part_village` | float64 | 47.5% | | | `altvillage` | float64 | 47.5% | | | `version` | object | 0.0% | | | `alts_secteur_rurale_0_alt_secteur_rurale` | float64 | 41.9% | | | `alts_secteur_rurale_1_alt_secteur_rurale` | float64 | 41.9% | | | `alts_village_0_altvillage` | float64 | 41.9% | | | `alts_village_1_altvillage` | float64 | 41.9% | | | `alts_village_2_altvillage` | float64 | 41.9% | | | `names_part_village_4_alt_part_village` | float64 | 41.9% | | | `names_part_village_5_alt_part_village` | float64 | 41.9% | | | `names_part_village_6_alt_part_village` | float64 | 41.9% | | | `alt_secteur_urbain` | float64 | 41.8% | | | `date` | datetime64[ns] | 52.5% | | | `prev_interview` | int64 | 0.0% | | | `alts_village_0_altvillage_1` | float64 | 34.0% | | | `alts_village_1_altvillage_1` | float64 | 33.2% | | | `alts_village_2_altvillage_1` | float64 | 32.4% | | | `constituency` | float64 | 4.1% | | | `ward` | float64 | 3.3% | | | `gsm_service` | int64 | 0.0% | | | `gsm_distance` | float64 | 33.4% | | | `adm4urb_def` | float64 | 34.0% | | | `adm4urban_name` | float64 | 34.0% | | | `adm5urban_name` | float64 | 34.0% | | | `treathome` | int64 | 0.0% | | | `birhome` | int64 | 0.0% | | | `marketout` | int64 | 0.0% | | | `esa_source` | object | 0.0% | | | `esa_processed` | object | 0.0% | | --- ## Numeric Summary | Column | Min | Max | Mean | Median | |---|---|---|---|---| | `unnamed_0` | 1.0 | 7200.0 | 3600.5 | 3600.5 | | `x` | 1.0 | 7200.0 | 3600.5 | 3600.5 | | `hh_gps_latitude` | 6.8446 | 10.292 | 8.6357 | 8.5551 | | `hh_gps_longitude` | -13.3118 | -8.1514 | -10.8994 | -10.6041 | | `marketcomsum` | 0.0 | 215.0 | 3.6601 | 0.0 | | `hh_gps_altitude` | -1033.2 | 999995.0 | 459.7153 | 367.3 | | `hh_gps_precision` | 0.8 | 999995.0 | 144.5712 | 6.2 | | `hhnum` | 0.0 | 999999.0 | 99201.3196 | 30.0 | | `drink_wat` | 1.0 | 999999.0 | 96813.9822 | 8.0 | | `water_time` | 1.0 | 999999.0 | 97006.8439 | 25.0 | | `toilet` | 1.0 | 999999.0 | 100284.6225 | 10.0 | | `floor` | 1.0 | 999999.0 | 100281.0728 | 1.0 | | `treat_main` | 1.0 | 999999.0 | 94588.5533 | 5.0 | | `treat_loc` | 1.0 | 999999.0 | 118889.9639 | 2.0 | | `treat_time` | -240.0 | 999999.0 | 119338.9825 | 120.0 | --- ## Curation Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 6 column(s) with >80% missing values were removed: `comm_dist_oth`, `floor_oth`, `ch_mult_oth`, `relgo`, `religiousmatch`, `toilet_oth`. 40 column(s) were cast from string to numeric or datetime based on parse-success rate (>85% threshold). The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet. --- ## Limitations - Data originates from American Red Cross (inactive) and has not been independently validated by ESA. - Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection. - The following columns have >20% missing values and should be treated with caution in modelling: `markcomm`, `worshiploc`, `mark_other`, `finalstart`, `finalend`, `namevillage_0_altvillage`, `namevillage_1_altvillage`, `namevillage_4_altvillage`.... - This dataset spans 3 countries; geographic and methodological inconsistencies across national boundaries may affect cross-country comparability. - Refer to the [original HDX dataset page](https://data.humdata.org/dataset/american-red-cross-west-africa-project) for the publisher's own methodology notes and caveats. --- ## Citation ```bibtex @dataset{hdx_africa_aid_flows_all, title = {American Red Cross West Africa Project}, author = {American Red Cross (inactive)}, year = {2026}, url = {https://data.humdata.org/dataset/american-red-cross-west-africa-project}, note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)} } ``` --- *[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作