electricsheepafrica/africa-world-bank-gender-indicators-for-gambia-the
收藏Hugging Face2026-04-11 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-world-bank-gender-indicators-for-gambia-the
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- tabular-classification
- tabular-regression
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- gender
- indicators
- gmb
pretty_name: "Gambia, The - Gender"
dataset_info:
splits:
- name: train
num_examples: 3716
- name: test
num_examples: 929
---
# Gambia, The - Gender
**Publisher:** World Bank Group · **Source:** [HDX](https://data.humdata.org/dataset/world-bank-gender-indicators-for-gambia-the) · **License:** `cc-by` · **Updated:** 2026-03-27
---
## Abstract
Contains data from the World Bank's [data portal](http://data.worldbank.org/). There is also a [consolidated country dataset](https://data.humdata.org/dataset/world-bank-combined-indicators-for-gambia-the) on HDX.
Gender equality is a core development objective in its own right. It is also smart development policy and sound business practice. It is integral to economic growth, business growth and good development outcomes. Gender equality can boost productivity, enhance prospects for the next generation, build resilience, and make institutions more representative and effective. In December 2015, the World Bank Group Board discussed our new Gender Equality Strategy 2016-2023, which aims to address persistent gaps and proposed a sharpened focus on more and better gender data. The Bank Group is continually scaling up commitments and expanding partnerships to fill significant gaps in gender data. The database hosts the latest sex-disaggregated data and gender statistics covering demography, education, health, access to economic opportunities, public life and decision-making, and agency.
Each row in this dataset represents country-level aggregates. Data was last updated on HDX on 2026-03-27. Geographic scope: **GMB**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Public health |
| **Unit of observation** | Country-level aggregates |
| **Rows (total)** | 4,646 |
| **Columns** | 8 (2 numeric, 6 categorical, 0 datetime) |
| **Train split** | 3,716 rows |
| **Test split** | 929 rows |
| **Geographic scope** | GMB |
| **Publisher** | World Bank Group |
| **HDX last updated** | 2026-03-27 |
---
## Variables
**Geographic** — `country_name` (Gambia, The), `country_iso3` (GMB), `year` (range 1960.0–2025.0).
**Outcome / Measurement** — `value` (range 0.0–391853.0).
**Identifier / Metadata** — `indicator_name` (Age population, age 02, male, Age population, age 01, male, Age population, age 00, female), `indicator_code` (SP.POP.AG02.MA.IN, SP.POP.AG01.MA.IN, SP.POP.AG00.FE.IN), `esa_source` (HDX), `esa_processed` (2026-04-11).
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-world-bank-gender-indicators-for-gambia-the")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `country_name` | object | 0.0% | Gambia, The |
| `country_iso3` | object | 0.0% | GMB |
| `year` | int64 | 0.0% | 1960.0 – 2025.0 (mean 1999.7512) |
| `indicator_name` | object | 0.0% | Age population, age 02, male, Age population, age 01, male, Age population, age 00, female |
| `indicator_code` | object | 0.0% | SP.POP.AG02.MA.IN, SP.POP.AG01.MA.IN, SP.POP.AG00.FE.IN |
| `value` | float64 | 0.0% | 0.0 – 391853.0 (mean 6403.2024) |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-11 |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `year` | 1960.0 | 2025.0 | 1999.7512 | 2002.0 |
| `value` | 0.0 | 391853.0 | 6403.2024 | 47.4893 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from World Bank Group and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/world-bank-gender-indicators-for-gambia-the) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_world_bank_gender_indicators_for_gambia_the,
title = {Gambia, The - Gender},
author = {World Bank Group},
year = {2026},
url = {https://data.humdata.org/dataset/world-bank-gender-indicators-for-gambia-the},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica
搜集汇总
数据集介绍

构建方式
在性别平等作为核心发展议题的背景下,该数据集源自世界银行集团的官方数据门户,由Electric Sheep Africa团队通过HDX平台的CKAN API获取原始资料并进行了系统化处理。原始数据经过标准化转换,列名统一为蛇形命名法,缺失值标记被规范为NaN,以确保数据格式的一致性。随后,采用固定随机种子将数据集按80:20的比例划分为训练集与测试集,并以Snappy压缩的Parquet格式存储,从而构建出适用于机器学习任务的结构化表格数据。
特点
该数据集聚焦于冈比亚的性别指标,涵盖1960年至2025年间的国家层面聚合数据,体现了长期时间跨度的观测价值。其核心特征在于整合了人口统计、教育、健康与经济机会等多维度的性别分列指标,通过指示器名称与代码的对应关系,提供了精细化的变量标识。数据规模包含4,646条记录,分为3,716条训练样本与929条测试样本,所有字段均无缺失值,确保了数据的完整性与可靠性,为性别平等政策的量化分析奠定了坚实基础。
使用方法
在公共健康与发展研究领域,该数据集可直接通过Hugging Face的datasets库加载,实现便捷的机器学习流程。用户使用load_dataset函数调用相应标识符后,可轻松转换为Pandas DataFrame进行探索性分析或模型训练。数据集适用于表格分类与回归任务,例如基于历史指标预测性别平等趋势,或构建分类模型以识别特定政策影响。研究者应注意数据源自世界银行,需结合原始方法论说明进行解读,以规避潜在的定义不一致或报告偏差问题。
背景与挑战
背景概述
在全球化发展议程中,性别平等不仅是核心的社会正义目标,更是驱动经济增长与提升社会韧性的关键要素。世界银行集团于2016年推出《性别平等战略(2016-2023)》,旨在通过强化性别数据收集与分析,系统性应对长期存在的发展差距。该数据集由世界银行集团创建,并由Electric Sheep Africa于2026年重新整理为机器学习可用格式,聚焦于冈比亚的国家级性别指标,涵盖人口结构、教育、健康及经济机会等多维度统计数据。其核心研究问题在于量化性别差异的动态演变,为政策制定者与研究人员提供实证基础,以评估干预措施的有效性并推动包容性发展。
当前挑战
该数据集致力于解决性别发展指标的可计算化与预测建模挑战,旨在通过机器学习方法揭示性别不平等与宏观经济、公共健康结果间的复杂关联。然而,构建过程中面临多重挑战:原始数据依赖于国家统计体系,可能存在报告不一致、定义差异及抽样偏差,影响跨年份与跨指标的可比性;自动化清洗流程虽统一了缺失值标记,却难以修正源数据中的误报或方法论局限。此外,数据集仅涵盖国家级聚合数据,缺乏个体或子区域层面的细粒度信息,限制了微观机制的分析深度,对模型泛化能力与因果推断构成显著约束。
常用场景
经典使用场景
在性别与发展研究领域,该数据集为分析冈比亚的性别平等动态提供了结构化时序数据。研究者常利用其涵盖1960至2025年的国家层面聚合指标,通过机器学习方法构建分类或回归模型,以揭示人口结构、教育健康等维度的性别差异演变规律。这类分析有助于量化历史趋势,并为政策评估提供实证基础。
实际应用
在实际政策制定与国际发展项目中,该数据集被用于监测冈比亚的性别相关可持续发展目标(SDGs)进展。政府部门与非营利组织可依据其分性别的人口与健康指标,评估教育普及、医疗资源分配等领域的性别包容性,进而设计更具针对性的社会项目,以促进女性经济参与和公共决策中的代表性。
衍生相关工作
围绕该数据集衍生的经典工作包括利用时序预测模型分析冈比亚性别人口结构变迁,以及结合其他非洲国家数据开展比较研究。例如,学者常将其纳入跨国面板分析,探讨性别指标与宏观经济表现的相关性;亦有研究基于此类数据开发自动化监测工具,为区域性别平等报告提供动态可视化支持。
以上内容由遇见数据集搜集并总结生成



