electricsheepafrica/africa-unhabitat-gm-indicators
收藏Hugging Face2026-04-11 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-unhabitat-gm-indicators
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- n<1K
source_datasets:
- original
task_categories:
- tabular-classification
- other
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- baseline-population
- education
- health
- hxl
- indicators
- transportation
- gmb
pretty_name: "Gambia - Demographic, Health, Education and Transport indicators"
dataset_info:
splits:
- name: train
num_examples: 48
- name: test
num_examples: 12
---
# Gambia - Demographic, Health, Education and Transport indicators
**Publisher:** United Nations Human Settlements Programmes, Data and Analytics Section · **Source:** [HDX](https://data.humdata.org/dataset/unhabitat-gm-indicators) · **License:** `cc-by-igo` · **Updated:** 2024-03-28
---
## Abstract
The urban indicators data available here are analyzed, compiled and published by UN-Habitat’s Global Urban Observatory which supports governments, local authorities and civil society organizations to develop urban indicators, data and statistics. Urban statistics are collected through household surveys and censuses conducted by national statistics authorities. Global Urban Observatory team analyses and compiles urban indicators statistics from surveys and censuses. Additionally, Local urban observatories collect, compile and analyze urban data for national policy development. Population statistics are produced by the United Nations Department of Economic and Social Affairs, World Urbanization Prospects.
Each row in this dataset represents first-level administrative unit observations. Data was last updated on HDX on 2024-03-28. Geographic scope: **GMB**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Public health |
| **Unit of observation** | First-level administrative unit observations |
| **Rows (total)** | 61 |
| **Columns** | 13 (5 numeric, 8 categorical, 0 datetime) |
| **Train split** | 48 rows |
| **Test split** | 12 rows |
| **Geographic scope** | GMB |
| **Publisher** | United Nations Human Settlements Programmes, Data and Analytics Section |
| **HDX last updated** | 2024-03-28 |
---
## Variables
**Geographic** — `category` (Population, Slum dwellers, #meta+category), `indicator_friendly` (Total population, Average annual rate of change of population – Urban, Proportion of urban population living in slum area), `type_data` (p, 1000, #indicator+type), `latitude` (range 13.28–13.4539), `longitude` (range -16.5917–-16.34) and 3 others.
**Outcome / Measurement** — `value` (range 0.55–3649.0).
**Identifier / Metadata** — `name` (Gambia, Banjul, #country+name), `esa_source` (HDX), `esa_processed` (2026-04-11).
**Other** — `indicator` (population, avg_annual_rate_change_percentage_urban, urban_population_living_in_slum).
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-unhabitat-gm-indicators")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `category` | object | 0.0% | Population, Slum dwellers, #meta+category |
| `indicator` | object | 0.0% | population, avg_annual_rate_change_percentage_urban, urban_population_living_in_slum |
| `indicator_friendly` | object | 0.0% | Total population, Average annual rate of change of population – Urban, Proportion of urban population living in slum area |
| `type_data` | object | 0.0% | p, 1000, #indicator+type |
| `latitude` | float64 | 1.6% | 13.28 – 13.4539 (mean 13.2887) |
| `longitude` | float64 | 1.6% | -16.5917 – -16.34 (mean -16.3526) |
| `region_id` | float64 | 1.6% | 289.0 – 289.0 (mean 289.0) |
| `country_id` | object | 0.0% | GM, #country+code+v_iso2 |
| `name` | object | 0.0% | Gambia, Banjul, #country+name |
| `year` | float64 | 1.6% | 1950.0 – 2050.0 (mean 2000.8) |
| `value` | float64 | 1.6% | 0.55 – 3649.0 (mean 623.7302) |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-11 |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `latitude` | 13.28 | 13.4539 | 13.2887 | 13.28 |
| `longitude` | -16.5917 | -16.34 | -16.3526 | -16.34 |
| `region_id` | 289.0 | 289.0 | 289.0 | 289.0 |
| `year` | 1950.0 | 2050.0 | 2000.8 | 2005.0 |
| `value` | 0.55 | 3649.0 | 623.7302 | 55.2 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 5 column(s) were cast from string to numeric or datetime based on parse-success rate (>85% threshold). The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from United Nations Human Settlements Programmes, Data and Analytics Section and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/unhabitat-gm-indicators) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_unhabitat_gm_indicators,
title = {Gambia - Demographic, Health, Education and Transport indicators},
author = {United Nations Human Settlements Programmes, Data and Analytics Section},
year = {2024},
url = {https://data.humdata.org/dataset/unhabitat-gm-indicators},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica
搜集汇总
数据集介绍

构建方式
在非洲城市化监测领域,该数据集由联合国人居署全球城市观测站精心构建,其数据来源于国家统计机构开展的家庭调查与人口普查。观测站团队对这些原始数据进行系统分析与汇编,形成涵盖人口、健康、教育及交通等多维度的城市指标。数据集以一级行政区划为观测单元,每条记录代表一个行政单位的统计观测,最终由Electric Sheep Africa团队通过CKAN API从HDX平台获取原始数据,并经过标准化清洗与格式转换,处理为适合机器学习应用的Parquet格式。
特点
本数据集聚焦于冈比亚的城市发展指标,其核心特征体现在多维度的结构化信息整合。数据集共包含61条观测记录,涵盖13个变量,其中5个为数值型,8个为分类型,完整覆盖了人口规模、贫民窟人口比例、年度变化率等关键指标。地理坐标与年份信息提供了时空维度的一致性,而统一的缺失值处理与规范的命名规则确保了数据的整洁性。数据集已预先划分为训练集与测试集,为模型开发提供了即用的基准分割,其紧凑的规模与清晰的模式适合快速探索与验证。
使用方法
在机器学习与数据分析实践中,该数据集可通过Hugging Face的datasets库直接加载,便捷地转换为Pandas DataFrame以进行后续处理。研究者可利用其训练集与测试集划分,开展回归预测、分类建模或时空分析等任务,尤其适用于城市发展指标的趋势分析与区域对比。使用时应参考原始发布方的方法说明,注意数据局限性,并依据地理与年份字段进行合理的子集筛选与特征工程,以充分发挥其在公共政策分析与可持续城市研究中的价值。
背景与挑战
背景概述
在城市化进程加速的全球背景下,联合国人居署的全球城市观测站致力于通过系统性的数据收集与分析,为政策制定者提供关键的城市发展指标。'africa-unhabitat-gm-indicators'数据集由联合国人类住区规划署的数据与分析部门于2024年发布,并由Electric Sheep Africa机构进行机器学习格式的优化整理。该数据集聚焦于冈比亚的一级行政区划单元,涵盖了人口、健康、教育与交通等多维度的统计信息,旨在支持城市可持续发展与公共政策研究。其核心研究问题在于量化并监测城市人口动态、贫民窟居民比例及基础设施服务的可及性,为非洲地区的城市化评估与干预策略提供了实证基础,对区域发展规划与人道主义行动具有显著的参考价值。
当前挑战
该数据集致力于解决城市发展指标的多维量化与监测问题,其核心挑战在于如何从有限的行政单元观测中,准确捕捉冈比亚城市化进程中的复杂动态,例如人口增长率与贫民窟居住条件的变化趋势。构建过程中的挑战主要源于原始数据的异构性与质量约束,包括不同年份与来源的统计口径可能存在定义不一致,以及自动化清洗流程难以纠正原始收集中的误报值或抽样偏差。此外,数据集规模较小,仅包含61条观测记录,可能限制了机器学习模型在泛化能力与稳健性方面的表现,对高精度预测任务构成潜在制约。
常用场景
经典使用场景
在非洲区域发展研究领域,该数据集为学者提供了冈比亚一级行政区划的人口、健康、教育和交通指标数据,其经典使用场景在于支持城市可持续发展分析。研究者可基于这些结构化指标,构建回归或分类模型,以预测城市人口变化趋势或评估基础设施需求,从而揭示区域发展的内在规律。
实际应用
在实际应用中,该数据集被政府部门和非营利组织用于制定区域发展规划和资源分配策略。例如,结合人口和健康指标,决策者可识别服务短缺区域,优化教育和医疗设施布局;交通数据则支持基础设施投资优先级评估,提升城市治理效能,促进联合国可持续发展目标的本地化实施。
衍生相关工作
围绕该数据集,已衍生出多项经典研究工作,包括基于机器学习的城市贫困预测模型和时空数据分析框架。这些工作扩展了数据在区域不平等研究、气候变化适应策略评估等领域的应用,同时激励了类似数据集的标准化处理流程,为非洲城市观测网络的建设提供了方法论参考。
以上内容由遇见数据集搜集并总结生成



