electricsheepafrica/africa-world-bank-gender-indicators-for-guinea
收藏Hugging Face2026-04-11 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-world-bank-gender-indicators-for-guinea
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- tabular-classification
- tabular-regression
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- gender
- indicators
- gin
pretty_name: "Guinea - Gender"
dataset_info:
splits:
- name: train
num_examples: 3701
- name: test
num_examples: 925
---
# Guinea - Gender
**Publisher:** World Bank Group · **Source:** [HDX](https://data.humdata.org/dataset/world-bank-gender-indicators-for-guinea) · **License:** `cc-by` · **Updated:** 2026-03-27
---
## Abstract
Contains data from the World Bank's [data portal](http://data.worldbank.org/). There is also a [consolidated country dataset](https://data.humdata.org/dataset/world-bank-combined-indicators-for-guinea) on HDX.
Gender equality is a core development objective in its own right. It is also smart development policy and sound business practice. It is integral to economic growth, business growth and good development outcomes. Gender equality can boost productivity, enhance prospects for the next generation, build resilience, and make institutions more representative and effective. In December 2015, the World Bank Group Board discussed our new Gender Equality Strategy 2016-2023, which aims to address persistent gaps and proposed a sharpened focus on more and better gender data. The Bank Group is continually scaling up commitments and expanding partnerships to fill significant gaps in gender data. The database hosts the latest sex-disaggregated data and gender statistics covering demography, education, health, access to economic opportunities, public life and decision-making, and agency.
Each row in this dataset represents country-level aggregates. Data was last updated on HDX on 2026-03-27. Geographic scope: **GIN**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Public health |
| **Unit of observation** | Country-level aggregates |
| **Rows (total)** | 4,627 |
| **Columns** | 8 (2 numeric, 6 categorical, 0 datetime) |
| **Train split** | 3,701 rows |
| **Test split** | 925 rows |
| **Geographic scope** | GIN |
| **Publisher** | World Bank Group |
| **HDX last updated** | 2026-03-27 |
---
## Variables
**Geographic** — `country_name` (Guinea), `country_iso3` (GIN), `year` (range 1960.0–2025.0).
**Outcome / Measurement** — `value` (range 0.0–1963865.0).
**Identifier / Metadata** — `indicator_name` (Age population, age 03, female, Age population, age 05, female, Age population, age 00, male), `indicator_code` (SP.POP.AG03.FE.IN, SP.POP.AG05.FE.IN, SP.POP.AG00.MA.IN), `esa_source` (HDX), `esa_processed` (2026-04-11).
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-world-bank-gender-indicators-for-guinea")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `country_name` | object | 0.0% | Guinea |
| `country_iso3` | object | 0.0% | GIN |
| `year` | int64 | 0.0% | 1960.0 – 2025.0 (mean 2000.11) |
| `indicator_name` | object | 0.0% | Age population, age 03, female, Age population, age 05, female, Age population, age 00, male |
| `indicator_code` | object | 0.0% | SP.POP.AG03.FE.IN, SP.POP.AG05.FE.IN, SP.POP.AG00.MA.IN |
| `value` | float64 | 0.0% | 0.0 – 1963865.0 (mean 36735.6533) |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-11 |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `year` | 1960.0 | 2025.0 | 2000.11 | 2003.0 |
| `value` | 0.0 | 1963865.0 | 36735.6533 | 41.4125 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from World Bank Group and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/world-bank-gender-indicators-for-guinea) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_world_bank_gender_indicators_for_guinea,
title = {Guinea - Gender},
author = {World Bank Group},
year = {2026},
url = {https://data.humdata.org/dataset/world-bank-gender-indicators-for-guinea},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica
搜集汇总
数据集介绍

构建方式
在性别与发展研究领域,高质量数据的系统性整合对于政策制定至关重要。该数据集源于世界银行集团发布的官方性别指标,由Electric Sheep Africa团队通过HDX平台的CKAN API获取原始数据,并经过标准化处理转化为机器学习友好的格式。构建过程中,字段名称被统一转换为蛇形命名法,常见的缺失值标记被规范为NaN,确保了数据的一致性。随后,采用固定随机种子将总计4627条记录按80:20的比例划分为训练集与测试集,最终以Snappy压缩的Parquet格式存储,为后续分析提供了结构清晰且可直接调用的数据基础。
特点
该数据集聚焦于几内亚的性别统计领域,涵盖了从1960年至2025年长达数十年的国家层面聚合数据。其核心特征体现在以八列结构化字段完整呈现人口、健康等多维指标,包括地理标识、年份、指标名称与代码以及具体数值。数据规模适中,包含3701条训练样本与925条测试样本,所有字段均无缺失值,保证了分析的完整性。特别值得注意的是,指标代码如SP.POP.AG03.FE.IN等遵循世界银行标准体系,使得数据具备良好的国际可比性与时序连续性,为深入探究性别平等的发展轨迹提供了可靠依据。
使用方法
在应用该数据集进行机器学习建模或统计分析时,用户可通过Hugging Face的datasets库便捷加载。使用load_dataset函数并指定相应路径即可获取已分割的训练集与测试集,进而利用to_pandas方法转换为DataFrame以进行后续处理。该数据集适用于表格分类与回归任务,例如基于历史指标预测性别相关趋势或进行区域发展评估。用户需注意数据源自世界银行,虽经清洗但未独立验证,因此建议结合原始发布方的方法论说明进行解读,以确保分析结论的稳健性与政策相关性。
背景与挑战
背景概述
性别平等作为核心发展目标,不仅关乎社会正义,亦是驱动经济增长与提升机构效能的关键要素。世界银行集团于2016年启动的性别平等战略(2016-2023),旨在通过强化性别数据收集与分析,系统性应对全球范围内持续存在的性别差距。在此背景下,'africa-world-bank-gender-indicators-for-guinea'数据集应运而生,由世界银行集团联合人道主义数据交换平台(HDX)共同构建,并由Electric Sheep Africa于2026年重新整理为机器学习可用格式。该数据集聚焦几内亚,汇集了自1960年至2025年间的国家级性别统计指标,涵盖人口结构、教育、健康及经济机会等多个维度,为深入探究该国性别动态提供了翔实的数据基础,对公共政策制定与发展研究具有重要参考价值。
当前挑战
该数据集致力于解决性别发展指标分析与预测中的复杂挑战,其核心在于从多维时间序列数据中识别性别差距的演变模式与驱动因素,并构建能够精准预测未来趋势的统计或机器学习模型。然而,数据构建过程面临显著障碍:原始数据来源于世界银行的宏观统计,可能存在报告不一致、定义差异或抽样偏差等问题,自动化清洗流程难以彻底纠正这些深层次的数据质量问题。此外,数据集仅涵盖几内亚单一国家,地理范围有限,限制了其结论在更广泛区域的可推广性。指标虽涉及多个领域,但变量维度相对较少,可能无法全面捕捉影响性别平等的复杂社会经济交互作用,为模型构建与因果推断带来内在约束。
常用场景
经典使用场景
在性别平等与发展经济学的研究领域中,该数据集作为几内亚国家层面性别指标的标准化集合,常被用于构建时间序列分析模型。研究者利用其涵盖1960年至2025年的跨年度数据,通过回归或分类方法,探索不同性别在人口结构、教育及健康等维度的历史演变规律与未来趋势预测,为理解性别动态提供了坚实的实证基础。
解决学术问题
该数据集有效应对了发展研究中性别数据碎片化与可比性不足的挑战。通过整合世界银行权威发布的性别统计指标,它使得学者能够系统性地检验性别平等与经济增长、人力资本积累之间的因果关系,并评估公共政策干预的长期效果,从而深化了对性别包容性发展机制的理论认识。
衍生相关工作
围绕该数据集衍生的经典工作,多集中于利用机器学习技术进行性别发展指标的预测与缺口分析。例如,部分研究构建了基于时序特征的预测模型,以预估未来性别比例变化;另一些工作则应用聚类算法,将几内亚的性别指标模式与其他非洲国家进行比较,从而揭示区域性的性别发展差异与共性规律。
以上内容由遇见数据集搜集并总结生成



