electricsheepafrica/africa-world-bank-gender-indicators-for-ghana
收藏Hugging Face2026-04-11 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-world-bank-gender-indicators-for-ghana
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- tabular-classification
- tabular-regression
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- gender
- indicators
- gha
pretty_name: "Ghana - Gender"
dataset_info:
splits:
- name: train
num_examples: 3959
- name: test
num_examples: 989
---
# Ghana - Gender
**Publisher:** World Bank Group · **Source:** [HDX](https://data.humdata.org/dataset/world-bank-gender-indicators-for-ghana) · **License:** `cc-by` · **Updated:** 2026-03-27
---
## Abstract
Contains data from the World Bank's [data portal](http://data.worldbank.org/). There is also a [consolidated country dataset](https://data.humdata.org/dataset/world-bank-combined-indicators-for-ghana) on HDX.
Gender equality is a core development objective in its own right. It is also smart development policy and sound business practice. It is integral to economic growth, business growth and good development outcomes. Gender equality can boost productivity, enhance prospects for the next generation, build resilience, and make institutions more representative and effective. In December 2015, the World Bank Group Board discussed our new Gender Equality Strategy 2016-2023, which aims to address persistent gaps and proposed a sharpened focus on more and better gender data. The Bank Group is continually scaling up commitments and expanding partnerships to fill significant gaps in gender data. The database hosts the latest sex-disaggregated data and gender statistics covering demography, education, health, access to economic opportunities, public life and decision-making, and agency.
Each row in this dataset represents country-level aggregates. Data was last updated on HDX on 2026-03-27. Geographic scope: **GHA**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Public health |
| **Unit of observation** | Country-level aggregates |
| **Rows (total)** | 4,949 |
| **Columns** | 8 (2 numeric, 6 categorical, 0 datetime) |
| **Train split** | 3,959 rows |
| **Test split** | 989 rows |
| **Geographic scope** | GHA |
| **Publisher** | World Bank Group |
| **HDX last updated** | 2026-03-27 |
---
## Variables
**Geographic** — `country_name` (Ghana), `country_iso3` (GHA), `year` (range 1960.0–2025.0).
**Outcome / Measurement** — `value` (range 0.0–6311681.0).
**Identifier / Metadata** — `indicator_name` (Age population, age 00, female, Age population, age 03, male, Age population, age 03, female), `indicator_code` (SP.POP.AG00.FE.IN, SP.POP.AG03.MA.IN, SP.POP.AG03.FE.IN), `esa_source` (HDX), `esa_processed` (2026-04-11).
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-world-bank-gender-indicators-for-ghana")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `country_name` | object | 0.0% | Ghana |
| `country_iso3` | object | 0.0% | GHA |
| `year` | int64 | 0.0% | 1960.0 – 2025.0 (mean 2000.7399) |
| `indicator_name` | object | 0.0% | Age population, age 00, female, Age population, age 03, male, Age population, age 03, female |
| `indicator_code` | object | 0.0% | SP.POP.AG00.FE.IN, SP.POP.AG03.MA.IN, SP.POP.AG03.FE.IN |
| `value` | float64 | 0.0% | 0.0 – 6311681.0 (mean 81473.0218) |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-11 |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `year` | 1960.0 | 2025.0 | 2000.7399 | 2004.0 |
| `value` | 0.0 | 6311681.0 | 81473.0218 | 48.5284 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from World Bank Group and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/world-bank-gender-indicators-for-ghana) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_world_bank_gender_indicators_for_ghana,
title = {Ghana - Gender},
author = {World Bank Group},
year = {2026},
url = {https://data.humdata.org/dataset/world-bank-gender-indicators-for-ghana},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica
搜集汇总
数据集介绍

构建方式
在性别平等与发展政策交叉的研究领域,该数据集通过系统整合世界银行数据门户的权威统计资料构建而成。原始数据从人道主义数据交换平台获取,经由自动化流程转化为机器学习友好的格式,具体包括通过CKAN API下载、统一缺失值标记为NaN、标准化列名为蛇形命名法,并采用固定随机种子将数据按80/20比例划分为训练集与测试集,最终以Snappy压缩的Parquet格式存储,确保了数据的结构一致性与处理效率。
特点
该数据集聚焦于加纳的性别指标,涵盖从1960年至2025年的国家层面聚合数据,包含4949条观测记录与8个变量。其核心特征在于整合了人口统计、教育、健康等多维度的性别分列指标,如特定年龄与性别人口数据,并以数值型与分类型变量相结合的形式呈现。数据经过清洗与标准化,缺失值统一处理,且具备明确的训练与测试划分,为实证研究提供了高结构化的分析基础。
使用方法
在应用该数据集进行机器学习建模时,研究者可通过Hugging Face的datasets库直接加载,并利用Python环境转换为Pandas DataFrame以进行后续分析。数据集适用于表格分类与回归任务,用户可基于年份、指标代码等特征预测数值型结果,或探索性别指标的时间趋势与关联模式。使用中需注意数据源自世界银行的原始收集,建议参考发布方的方法学说明以理解潜在的定义差异与统计局限。
背景与挑战
背景概述
在全球化发展议程中,性别平等不仅是一项核心人权,更是推动社会进步与经济增长的关键杠杆。世界银行集团于2016年发布《性别平等战略(2016-2023)》,将性别数据收集与分析置于优先地位,旨在通过精准的统计洞察,揭示不同性别群体在人口结构、教育机会、健康福祉及经济参与等维度的差异。该数据集由世界银行集团创建,并由Electric Sheep Africa于2026年重新整理为机器学习可用格式,聚焦加纳国家的性别指标,涵盖了自1960年至2025年的国家层面聚合数据。其核心研究问题在于量化性别不平等现象,为政策制定者、研究人员及国际组织提供实证基础,以评估发展干预措施的有效性,并推动包容性增长战略的实施。
当前挑战
该数据集致力于解决性别发展指标的多维度量挑战,其核心任务在于通过表格分类与回归分析,从复杂的社会经济变量中识别影响性别平等的关键因素。然而,构建过程面临多重障碍:原始数据依赖于各国统计机构的报告,可能存在定义不一致、时间序列断裂或区域性覆盖不全等问题,导致跨年度、跨指标的可比性受到制约。此外,自动化清洗流程虽能统一缺失值标记,却难以修正源数据中潜在的误报偏差或抽样误差,这要求使用者必须结合世界银行的方法论说明进行审慎解读。数据集的有限规模与单一国家范围,也限制了模型在更广泛地理或文化背景下的泛化能力,对机器学习方法提出了稳健性与可解释性的双重考验。
常用场景
经典使用场景
在性别与发展研究领域,该数据集作为加纳国家层面性别统计的核心资源,常被用于构建时间序列分析模型,以追踪性别指标在人口、教育、健康等维度的长期演变趋势。研究者通过整合指标代码与数值变量,能够系统评估性别平等政策的实施效果,为发展经济学与公共政策分析提供实证基础。
衍生相关工作
围绕该数据集衍生的经典研究包括基于机器学习的性别差距预测模型、面板数据回归分析政策效应,以及跨国家性别指数比较研究。学者常将其与健康调查、劳动力市场数据融合,构建多维贫困度量框架,或开发可视化工具以增强性别统计的公众可及性,进一步拓展了发展数据科学的方法边界。
数据集最近研究
最新研究方向
在性别与发展研究领域,加纳性别指标数据集正推动基于机器学习的政策干预分析成为前沿热点。研究者利用该数据集的时间序列特征,构建预测模型以模拟教育、健康等领域性别平等政策的长期影响,结合因果推断方法识别关键驱动因素。随着全球可持续发展议程的深化,此类数据在评估跨领域性别差距、优化资源分配方面展现出重要价值,为实证研究提供了高颗粒度的国别基准。
以上内容由遇见数据集搜集并总结生成



