electricsheepafrica/africa-world-bank-combined-indicators-for-ghana
收藏Hugging Face2026-04-11 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-world-bank-combined-indicators-for-ghana
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- 10K<n<100K
source_datasets:
- original
task_categories:
- tabular-classification
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- agriculture-livestock
- aid-effectiveness
- climate-weather
- development
- economics
- education
- energy
- environment
- gha
pretty_name: "Ghana - Economic, Social, Environmental, Health, Education, Development and Energy"
dataset_info:
splits:
- name: train
num_examples: 51160
- name: test
num_examples: 12790
---
# Ghana - Economic, Social, Environmental, Health, Education, Development and Energy
**Publisher:** World Bank Group · **Source:** [HDX](https://data.humdata.org/dataset/world-bank-combined-indicators-for-ghana) · **License:** `cc-by` · **Updated:** 2026-03-27
---
## Abstract
Contains data from the World Bank's [data portal](http://data.worldbank.org/) covering the following topics which also exist as individual datasets on HDX: [Agriculture and Rural Development](https://data.humdata.org/dataset/world-bank-agriculture-and-rural-development-indicators-for-ghana), [Aid Effectiveness](https://data.humdata.org/dataset/world-bank-aid-effectiveness-indicators-for-ghana), [Economy and Growth](https://data.humdata.org/dataset/world-bank-economy-and-growth-indicators-for-ghana), [Education](https://data.humdata.org/dataset/world-bank-education-indicators-for-ghana), [Energy and Mining](https://data.humdata.org/dataset/world-bank-energy-and-mining-indicators-for-ghana), [Environment](https://data.humdata.org/dataset/world-bank-environment-indicators-for-ghana), [Financial Sector](https://data.humdata.org/dataset/world-bank-financial-sector-indicators-for-ghana), [Health](https://data.humdata.org/dataset/world-bank-health-indicators-for-ghana), [Infrastructure](https://data.humdata.org/dataset/world-bank-infrastructure-indicators-for-ghana), [Social Protection and Labor](https://data.humdata.org/dataset/world-bank-social-protection-and-labor-indicators-for-ghana), [Poverty](https://data.humdata.org/dataset/world-bank-poverty-indicators-for-ghana), [Private Sector](https://data.humdata.org/dataset/world-bank-private-sector-indicators-for-ghana), [Public Sector](https://data.humdata.org/dataset/world-bank-public-sector-indicators-for-ghana), [Science and Technology](https://data.humdata.org/dataset/world-bank-science-and-technology-indicators-for-ghana), [Social Development](https://data.humdata.org/dataset/world-bank-social-development-indicators-for-ghana), [Urban Development](https://data.humdata.org/dataset/world-bank-urban-development-indicators-for-ghana), [Gender](https://data.humdata.org/dataset/world-bank-gender-indicators-for-ghana), [Millenium development goals](https://data.humdata.org/dataset/world-bank-millenium-development-goals-indicators-for-ghana), [Climate Change](https://data.humdata.org/dataset/world-bank-climate-change-indicators-for-ghana), [External Debt](https://data.humdata.org/dataset/world-bank-external-debt-indicators-for-ghana), [Trade](https://data.humdata.org/dataset/world-bank-trade-indicators-for-ghana).
Each row in this dataset represents country-level aggregates. Data was last updated on HDX on 2026-03-27. Geographic scope: **GHA**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Public health |
| **Unit of observation** | Country-level aggregates |
| **Rows (total)** | 63,951 |
| **Columns** | 8 (2 numeric, 6 categorical, 0 datetime) |
| **Train split** | 51,160 rows |
| **Test split** | 12,790 rows |
| **Geographic scope** | GHA |
| **Publisher** | World Bank Group |
| **HDX last updated** | 2026-03-27 |
---
## Variables
**Geographic** — `country_name` (Ghana), `country_iso3` (GHA), `year` (range 1960.0–2025.0).
**Outcome / Measurement** — `value` (range -61362744410.0–3125280063460.0).
**Identifier / Metadata** — `indicator_name` (Domestic credit to private sector (% of GDP), Population in urban agglomerations of more than 1 million (% of total population), Population in largest city), `indicator_code` (SM.POP.NETM, EN.URB.LCTY, EN.URB.LCTY.UR.ZS), `esa_source` (HDX), `esa_processed` (2026-04-11).
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-world-bank-combined-indicators-for-ghana")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `country_name` | object | 0.0% | Ghana |
| `country_iso3` | object | 0.0% | GHA |
| `year` | int64 | 0.0% | 1960.0 – 2025.0 (mean 1999.6678) |
| `indicator_name` | object | 0.0% | Domestic credit to private sector (% of GDP), Population in urban agglomerations of more than 1 million (% of total population), Population in largest city |
| `indicator_code` | object | 0.0% | SM.POP.NETM, EN.URB.LCTY, EN.URB.LCTY.UR.ZS |
| `value` | float64 | 0.0% | -61362744410.0 – 3125280063460.0 (mean 2358493718.5167) |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-11 |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `year` | 1960.0 | 2025.0 | 1999.6678 | 2003.0 |
| `value` | -61362744410.0 | 3125280063460.0 | 2358493718.5167 | 44.6 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 18,727 exact duplicate rows were removed. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from World Bank Group and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/world-bank-combined-indicators-for-ghana) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_world_bank_combined_indicators_for_ghana,
title = {Ghana - Economic, Social, Environmental, Health, Education, Development and Energy},
author = {World Bank Group},
year = {2026},
url = {https://data.humdata.org/dataset/world-bank-combined-indicators-for-ghana},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica
搜集汇总
数据集介绍

构建方式
在数据科学领域,构建一个全面且结构化的数据集对于深入理解国家发展轨迹至关重要。该数据集由世界银行集团发布,原始数据来源于其公开数据门户,涵盖了加纳在农业、经济、教育、能源、环境、健康等二十余个关键发展领域的指标。数据经过Electric Sheep Africa团队的精心整理,通过HDX平台的CKAN API获取,并转换为Parquet格式以优化存储与读取效率。构建过程中,团队对列名进行了标准化处理,统一了缺失值标记,并移除了大量重复记录,最终按照80:20的比例划分了训练集与测试集,确保了数据在机器学习应用中的即用性。
特点
该数据集作为加纳国家层面发展指标的综合汇编,其显著特点在于跨领域的广泛覆盖与时间序列的完整性。数据囊括了从1960年至2025年长达六十余年的观测记录,提供了超过六万三千条国家层面的聚合数据。每条记录包含地理标识、年份、具体指标名称与代码以及对应的数值,结构清晰且无缺失值,为纵向与横向分析提供了坚实基础。其涵盖的主题从宏观经济、社会发展到气候变化,呈现了多维度的国家发展图景,尤其适合用于探索不同发展维度间的关联性与长期趋势。
使用方法
对于研究人员而言,该数据集为开展加纳乃至西非地区的多领域发展研究提供了宝贵的量化资源。使用者可通过Hugging Face的`datasets`库便捷加载数据,并轻松转换为Pandas DataFrame进行后续分析。数据集已预先划分为训练集与测试集,可直接应用于时间序列预测、指标分类或发展模式识别等机器学习任务。在具体分析中,可依据`indicator_code`或`indicator_name`筛选特定领域的指标,结合`year`字段进行趋势分析,或利用`value`字段进行跨指标的对比与建模,以揭示加纳社会经济发展的内在规律。
背景与挑战
背景概述
在全球化与可持续发展议程不断深化的背景下,对各国社会经济与环境状况进行系统性量化评估成为国际组织与学术界的重要关切。世界银行集团作为全球发展数据的重要生产者,长期致力于构建覆盖多领域的指标体系,以支持政策分析与学术研究。该数据集由世界银行集团发布,并由Electric Sheep Africa于2026年重新整理为机器学习可用格式,聚焦于加纳自1960年至2025年间的国家层面聚合数据,涵盖经济、社会、环境、健康、教育、发展及能源等二十余个关键主题。其核心研究问题在于通过多维指标追踪加纳的发展轨迹,为发展经济学、公共政策及跨学科研究提供实证基础,对理解非洲国家的发展路径与挑战具有重要参考价值。
当前挑战
该数据集旨在解决多维度发展指标的综合分析与预测问题,其核心挑战在于如何整合高度异质且时空跨度大的指标,以构建稳健的机器学习模型。具体而言,领域挑战包括处理指标间量纲与尺度差异巨大、时间序列中存在缺失值与异常值,以及如何从高度聚合的国家层面数据中提取具有判别力的特征以支持分类或回归任务。在构建过程中,挑战主要源于原始数据的异构性,需统一不同来源的缺失值标记并移除大量重复记录;同时,数据清洗虽能处理格式问题,却无法修正原始数据可能存在的报告偏差、定义不一致或方法论局限,这要求使用者必须审慎参考世界银行的方法学说明以避免误读。
常用场景
经典使用场景
在非洲发展经济学与公共政策研究领域,该数据集作为加纳国家层面多维度指标的集成资源,其经典应用场景聚焦于时间序列分析与跨领域关联建模。研究者常利用其涵盖1960年至2025年的经济社会环境面板数据,构建预测模型以模拟政策干预效果,例如通过能源消耗与经济增长指标的协整分析,评估可持续发展路径的可行性。数据集的结构化特征支持机器学习算法进行特征工程,为复杂系统动态提供量化依据。
解决学术问题
该数据集有效解决了发展研究中跨学科指标整合不足的难题,通过系统化聚合世界银行二十余个主题领域的标准化数据,为学者提供了检验理论假设的实证基础。其在学术层面的核心意义在于消弭了宏观分析中数据碎片化的障碍,使得研究者能够深入探讨教育投入与健康产出的因果关系、气候变化对农业经济的边际影响等经典议题。这种多维度的数据架构推动了发展经济学的计量革命,增强了政策评估的严谨性与可复现性。
衍生相关工作
围绕该数据集衍生的经典研究包括基于面板数据模型的可持续发展目标(SDGs)进展评估框架,以及采用机器学习方法预测经济脆弱性的开源工具链。学者们通过特征选择算法识别影响加纳人类发展指数的关键驱动因子,并构建了教育公平性与经济增长的门槛效应模型。这些工作显著丰富了发展计量经济学的方法论体系,为后续构建非洲多国比较分析平台奠定了数据基础。
以上内容由遇见数据集搜集并总结生成



