electricsheepafrica/africa-climate-guinea-bissau
收藏Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-climate-guinea-bissau
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: other
multilinguality:
- monolingual
size_categories:
- 10K<n<100K
source_datasets:
- original
task_categories:
- tabular-classification
- other
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- climate-weather
- conflict-violence
- economics
- education
- food-security
- hazards-and-risk
- health
- indicators
- gnb
pretty_name: "HDX HAPI Data for Guinea-Bissau"
dataset_info:
splits:
- name: train
num_examples: 11086
- name: test
num_examples: 2771
---
# HDX HAPI Data for Guinea-Bissau
**Publisher:** HDX Humanitarian API Data · **Source:** [HDX](https://data.humdata.org/dataset/hdx-hapi-gnb) · **License:** `hdx-other` · **Updated:** 2026-02-18
---
## Abstract
This dataset contains data obtained from the
[HDX Humanitarian API](https://hapi.humdata.org/) (HDX HAPI),
which provides standardized humanitarian indicators designed
for seamless interoperability from multiple sources.
The data facilitates automated workflows and visualizations
to support humanitarian decision making.
For more information, please see the HDX HAPI
[landing page](https://data.humdata.org/hapi)
and
[documentation](https://hdx-hapi.readthedocs.io/en/latest/).
Each row in this dataset represents geolocated point observations. Temporal coverage is indicated by the `reference_period_start`, `reference_period_end` column(s). Geographic scope: **GNB**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Food security and nutrition |
| **Unit of observation** | Geolocated point observations |
| **Rows (total)** | 13,858 |
| **Columns** | 16 (3 numeric, 7 categorical, 2 datetime) |
| **Train split** | 11,086 rows |
| **Test split** | 2,771 rows |
| **Geographic scope** | GNB |
| **Publisher** | HDX Humanitarian API Data |
| **HDX last updated** | 2026-02-18 |
---
## Variables
**Geographic** — `origin_location_code` (GNB, SEN, CIV), `asylum_location_code` (GNB, DEU, CHE), `asylum_has_hrp`, `asylum_in_gho`, `population_group` (ASY, REF) and 2 others.
**Temporal** — `reference_period_start`, `reference_period_end`.
**Demographic** — `gender` (f, m, all), `age_range` (all, 0-4, 5-11), `min_age` (range 0.0–60.0).
**Identifier / Metadata** — `esa_source` (HDX), `esa_processed` (2026-04-21).
**Other** — `origin_has_hrp`, `origin_in_gho`.
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-climate-guinea-bissau")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `origin_location_code` | object | 0.0% | GNB, SEN, CIV |
| `origin_has_hrp` | bool | 0.0% | |
| `origin_in_gho` | bool | 0.0% | |
| `asylum_location_code` | object | 0.0% | GNB, DEU, CHE |
| `asylum_has_hrp` | bool | 0.0% | |
| `asylum_in_gho` | bool | 0.0% | |
| `population_group` | object | 0.0% | ASY, REF |
| `gender` | object | 0.0% | f, m, all |
| `age_range` | object | 0.0% | all, 0-4, 5-11 |
| `min_age` | float64 | 23.1% | 0.0 – 60.0 (mean 19.0) |
| `max_age` | float64 | 38.5% | 4.0 – 59.0 (mean 22.75) |
| `population` | int64 | 0.0% | 0.0 – 10061.0 (mean 38.2123) |
| `reference_period_start` | datetime64[ns] | 0.0% | |
| `reference_period_end` | datetime64[ns] | 0.0% | |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-21 |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `min_age` | 0.0 | 60.0 | 19.0 | 12.0 |
| `max_age` | 4.0 | 59.0 | 22.75 | 14.0 |
| `population` | 0.0 | 10061.0 | 38.2123 | 0.0 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 2 column(s) were cast from string to numeric or datetime based on parse-success rate (>85% threshold). The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from HDX Humanitarian API Data and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- The following columns have >20% missing values and should be treated with caution in modelling: `min_age`, `max_age`.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/hdx-hapi-gnb) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_climate_guinea_bissau,
title = {HDX HAPI Data for Guinea-Bissau},
author = {HDX Humanitarian API Data},
year = {2026},
url = {https://data.humdata.org/dataset/hdx-hapi-gnb},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
annotations_creators: 注释创建者
- 无注释
language_creators: 语言创建者
- 采集获取
language: 语言
- 英语
license: 许可证
- 其他
multilinguality: 多语言属性
- 单语言
size_categories: 数据规模
- 10K<n<100K
source_datasets: 源数据集
- 原创数据集
task_categories: 任务类别
- 表格分类
- 其他
task_ids: 任务子项
- 无
tags: 标签
- 非洲
- 人道主义
- HDX
- electric-sheep-africa
- 气候与天气
- 冲突与暴力
- 经济学
- 教育
- 粮食安全
- 灾害与风险
- 健康
- 指标
- 几内亚比绍国家代码(GNB)
pretty_name: 展示名称
- "HDX HAPI 几内亚比绍数据集(HDX HAPI Data for Guinea-Bissau)"
dataset_info: 数据集信息
splits: 数据划分
- name: 训练集
num_examples: 11086
- name: 测试集
num_examples: 2771
# HDX HAPI 几内亚比绍数据集(HDX HAPI Data for Guinea-Bissau)
**发布方**:HDX 人道主义 API 数据(HDX Humanitarian API Data) · **来源**:[HDX](https://data.humdata.org/dataset/hdx-hapi-gnb) · **许可证**:`hdx-other` · **更新时间**:2026-02-18
---
## 摘要(Abstract)
本数据集源自[HDX 人道主义 API(HDX Humanitarian API,简称HAPI)](https://hapi.humdata.org/),该接口提供标准化的人道主义指标,旨在实现多源数据的无缝互通。本数据集可支持自动化工作流与可视化任务,助力人道主义决策制定。如需了解更多信息,请访问HDX HAPI[首页](https://data.humdata.org/hapi)与[官方文档](https://hdx-hapi.readthedocs.io/en/latest/)。
数据集中每一行代表一个地理定位的点位观测值。时间覆盖范围由`reference_period_start`(参考周期起始)与`reference_period_end`(参考周期结束)列标识。地理覆盖范围:**几内亚比绍(GNB)**。
*本数据集已由[Electric Sheep Africa](https://huggingface.co/electricsheepafrica)整理为适用于机器学习的Parquet格式(Parquet)。*
---
## 数据集特征(Dataset Characteristics)
| | |
|---|---|
| **领域** | 粮食安全与营养 |
| **观测单元** | 地理定位点位观测值 |
| **总样本行数** | 13,858 |
| **列数** | 16列(3个数值型、7个分类型、2个日期时间型) |
| **训练集划分** | 11,086行 |
| **测试集划分** | 2,771行 |
| **地理覆盖范围** | 几内亚比绍(GNB) |
| **发布方** | HDX 人道主义 API 数据 |
| **HDX 最后更新时间** | 2026-02-18 |
---
## 变量(Variables)
**地理类变量**:`origin_location_code`(来源地位置代码,取值为GNB、SEN、CIV)、`asylum_location_code`(庇护地位置代码,取值为GNB、DEU、CHE)、`asylum_has_hrp`、`asylum_in_gho`、`population_group`(人口群体,取值为ASY、REF)及另外2个变量。
**时间类变量**:`reference_period_start`(参考周期起始)、`reference_period_end`(参考周期结束)。
**人口统计类变量**:`gender`(性别,取值为f、m、all)、`age_range`(年龄区间,取值为all、0-4、5-11)、`min_age`(最小年龄,取值范围0.0–60.0)。
**标识符与元数据类变量**:`esa_source`(数据来源,取值为HDX)、`esa_processed`(数据处理时间,2026-04-21)。
**其他变量**:`origin_has_hrp`、`origin_in_gho`。
---
## 快速上手(Quick Start)
python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-climate-guinea-bissau")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
---
## 数据 Schema(Schema)
| 列名 | 数据类型 | 缺失率 | 取值范围/示例值 |
|---|---|---|---|
| `origin_location_code` | object | 0.0% | GNB, SEN, CIV |
| `origin_has_hrp` | bool | 0.0% | |
| `origin_in_gho` | bool | 0.0% | |
| `asylum_location_code` | object | 0.0% | GNB, DEU, CHE |
| `asylum_has_hrp` | bool | 0.0% | |
| `asylum_in_gho` | bool | 0.0% | |
| `population_group` | object | 0.0% | ASY, REF |
| `gender` | object | 0.0% | f, m, all |
| `age_range` | object | 0.0% | all, 0-4, 5-11 |
| `min_age` | float64 | 23.1% | 0.0 – 60.0(均值为19.0) |
| `max_age` | float64 | 38.5% | 4.0 – 59.0(均值为22.75) |
| `population` | int64 | 0.0% | 0.0 – 10061.0(均值为38.2123) |
| `reference_period_start` | datetime64[ns] | 0.0% | |
| `reference_period_end` | datetime64[ns] | 0.0% | |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-21 |
---
## 数值型变量汇总(Numeric Summary)
| 列名 | 最小值 | 最大值 | 均值 | 中位数 |
|---|---|---|---|---|
| `min_age` | 0.0 | 60.0 | 19.0 | 12.0 |
| `max_age` | 4.0 | 59.0 | 22.75 | 14.0 |
| `population` | 0.0 | 10061.0 | 38.2123 | 0.0 |
---
## 数据整理流程(Curation)
原始数据通过CKAN API从HDX下载,并转换为Parquet格式(Parquet)。列名统一转换为小写并标准化为蛇形命名法(snake_case)。将常见缺失值标记(`N/A`、`null`、`none`、`-`、`unknown`、`no data`、`#N/A`)统一替换为`NaN`。根据解析成功率(阈值>85%),将2列从字符串类型转换为数值型或日期时间型。本数据集采用固定随机种子(42)按80/20比例划分为训练集与测试集,并保存为Snappy压缩的Parquet格式文件。
---
## 局限性(Limitations)
- 本数据集源自HDX人道主义API数据,未经过Electric Sheep Africa的独立验证。
- 自动化清洗流程无法修正原始数据收集中的错报值、定义不一致或采样偏差问题。
- 以下两列的缺失率超过20%,在建模过程中需谨慎使用:`min_age`、`max_age`。
- 如需了解发布方的方法说明与免责条款,请参阅[原始HDX数据集页面](https://data.humdata.org/dataset/hdx-hapi-gnb)。
---
## 引用(Citation)
bibtex
@dataset{hdx_africa_climate_guinea_bissau,
title = {HDX HAPI Data for Guinea-Bissau},
author = {HDX Humanitarian API Data},
year = {2026},
url = {https://data.humdata.org/dataset/hdx-hapi-gnb},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — 非洲机器学习数据集基础设施。尼日利亚拉各斯。*
提供机构:
electricsheepafrica



