electricsheepafrica/africa-world-bank-combined-indicators-for-guinea
收藏Hugging Face2026-04-11 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-world-bank-combined-indicators-for-guinea
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- 10K<n<100K
source_datasets:
- original
task_categories:
- tabular-classification
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- agriculture-livestock
- aid-effectiveness
- climate-weather
- development
- economics
- education
- energy
- environment
- gin
pretty_name: "Guinea - Economic, Social, Environmental, Health, Education, Development and Energy"
dataset_info:
splits:
- name: train
num_examples: 40008
- name: test
num_examples: 10002
---
# Guinea - Economic, Social, Environmental, Health, Education, Development and Energy
**Publisher:** World Bank Group · **Source:** [HDX](https://data.humdata.org/dataset/world-bank-combined-indicators-for-guinea) · **License:** `cc-by` · **Updated:** 2026-03-27
---
## Abstract
Contains data from the World Bank's [data portal](http://data.worldbank.org/) covering the following topics which also exist as individual datasets on HDX: [Agriculture and Rural Development](https://data.humdata.org/dataset/world-bank-agriculture-and-rural-development-indicators-for-guinea), [Aid Effectiveness](https://data.humdata.org/dataset/world-bank-aid-effectiveness-indicators-for-guinea), [Economy and Growth](https://data.humdata.org/dataset/world-bank-economy-and-growth-indicators-for-guinea), [Education](https://data.humdata.org/dataset/world-bank-education-indicators-for-guinea), [Energy and Mining](https://data.humdata.org/dataset/world-bank-energy-and-mining-indicators-for-guinea), [Environment](https://data.humdata.org/dataset/world-bank-environment-indicators-for-guinea), [Financial Sector](https://data.humdata.org/dataset/world-bank-financial-sector-indicators-for-guinea), [Health](https://data.humdata.org/dataset/world-bank-health-indicators-for-guinea), [Infrastructure](https://data.humdata.org/dataset/world-bank-infrastructure-indicators-for-guinea), [Social Protection and Labor](https://data.humdata.org/dataset/world-bank-social-protection-and-labor-indicators-for-guinea), [Poverty](https://data.humdata.org/dataset/world-bank-poverty-indicators-for-guinea), [Private Sector](https://data.humdata.org/dataset/world-bank-private-sector-indicators-for-guinea), [Public Sector](https://data.humdata.org/dataset/world-bank-public-sector-indicators-for-guinea), [Science and Technology](https://data.humdata.org/dataset/world-bank-science-and-technology-indicators-for-guinea), [Social Development](https://data.humdata.org/dataset/world-bank-social-development-indicators-for-guinea), [Urban Development](https://data.humdata.org/dataset/world-bank-urban-development-indicators-for-guinea), [Gender](https://data.humdata.org/dataset/world-bank-gender-indicators-for-guinea), [Millenium development goals](https://data.humdata.org/dataset/world-bank-millenium-development-goals-indicators-for-guinea), [Climate Change](https://data.humdata.org/dataset/world-bank-climate-change-indicators-for-guinea), [External Debt](https://data.humdata.org/dataset/world-bank-external-debt-indicators-for-guinea), [Trade](https://data.humdata.org/dataset/world-bank-trade-indicators-for-guinea).
Each row in this dataset represents country-level aggregates. Data was last updated on HDX on 2026-03-27. Geographic scope: **GIN**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Public health |
| **Unit of observation** | Country-level aggregates |
| **Rows (total)** | 50,011 |
| **Columns** | 8 (2 numeric, 6 categorical, 0 datetime) |
| **Train split** | 40,008 rows |
| **Test split** | 10,002 rows |
| **Geographic scope** | GIN |
| **Publisher** | World Bank Group |
| **HDX last updated** | 2026-03-27 |
---
## Variables
**Geographic** — `country_name` (Guinea), `country_iso3` (GIN), `year` (range 1960.0–2025.0).
**Outcome / Measurement** — `value` (range -40930932069200.0–246769275016460.0).
**Identifier / Metadata** — `indicator_name` (Population in largest city, Population in the largest city (% of urban population), Population in urban agglomerations of more than 1 million), `indicator_code` (EN.URB.LCTY, EN.URB.LCTY.UR.ZS, EN.URB.MCTY), `esa_source` (HDX), `esa_processed` (2026-04-11).
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-world-bank-combined-indicators-for-guinea")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `country_name` | object | 0.0% | Guinea |
| `country_iso3` | object | 0.0% | GIN |
| `year` | int64 | 0.0% | 1960.0 – 2025.0 (mean 2001.7048) |
| `indicator_name` | object | 0.0% | Population in largest city, Population in the largest city (% of urban population), Population in urban agglomerations of more than 1 million |
| `indicator_code` | object | 0.0% | EN.URB.LCTY, EN.URB.LCTY.UR.ZS, EN.URB.MCTY |
| `value` | float64 | 0.0% | -40930932069200.0 – 246769275016460.0 (mean 654924760047.1743) |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-11 |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `year` | 1960.0 | 2025.0 | 2001.7048 | 2004.0 |
| `value` | -40930932069200.0 | 246769275016460.0 | 654924760047.1743 | 51.1 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 15,084 exact duplicate rows were removed. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from World Bank Group and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/world-bank-combined-indicators-for-guinea) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_world_bank_combined_indicators_for_guinea,
title = {Guinea - Economic, Social, Environmental, Health, Education, Development and Energy},
author = {World Bank Group},
year = {2026},
url = {https://data.humdata.org/dataset/world-bank-combined-indicators-for-guinea},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica
搜集汇总
数据集介绍

构建方式
在非洲发展研究领域,该数据集整合了世界银行数据门户中关于几内亚的多维度指标,覆盖农业、经济、教育、能源、环境、健康等二十余个关键主题。原始数据通过HDX平台获取,经由Electric Sheep Africa团队进行系统化处理,包括从CKAN API下载、转换为Parquet格式、统一缺失值标记,并剔除了约一万五千条重复记录。最终数据以80/20的比例划分为训练集与测试集,采用固定随机种子确保可复现性,形成了涵盖1960年至2025年国家层面聚合观测的结构化表格。
特点
本数据集以国家为观测单元,囊括了超过五万条记录,每条记录包含八个字段,其中两个数值型变量和六个分类型变量。数据时间跨度长达六十五年,指标代码与名称一一对应,如EN.URB.LCTY代表最大城市人口,数值范围从负值到万亿级别,反映了经济与社会指标的广泛分布。数据集采用Snappy压缩的Parquet格式存储,具备完整的训练与测试划分,且所有字段均无缺失值,为机器学习任务提供了高度规整的输入基础。
使用方法
借助Hugging Face的datasets库,用户可通过几行代码加载该数据集,并轻松转换为Pandas DataFrame进行探索性分析。数据集适用于表格分类等机器学习任务,研究者可利用年份、指标代码与数值变量构建预测模型,分析几内亚在经济发展、社会变迁与环境变化等方面的长期趋势。需要注意的是,数据源自世界银行的官方统计,使用时应参考原始方法论说明,以理解指标定义与收集过程中的潜在局限性。
背景与挑战
背景概述
在全球化与可持续发展议程的推动下,对各国社会经济与环境状况进行系统性量化评估成为国际发展研究的关键。世界银行集团作为权威的国际金融机构,长期致力于构建覆盖多领域的宏观指标体系,以监测各国发展进程。该数据集由世界银行集团创建,并由Electric Sheep Africa于2026年重新整理为机器学习可用格式,核心研究问题在于整合几内亚在农业、经济、教育、能源、环境、健康等二十余个关键领域的综合指标,为研究人员和政策制定者提供一个跨维度、长时间序列的国家级数据视图,从而支持对几内亚发展模式的深入分析与跨领域关联研究。
当前挑战
该数据集旨在解决多维度发展指标的综合分析与预测挑战,其核心任务可归类为表格分类问题,涉及从复杂的社会经济与环境指标中识别模式与关联。在构建过程中,数据整合面临显著挑战:原始数据来源于世界银行多个独立主题数据集,需进行跨源合并与标准化处理,过程中清除了大量重复条目并统一了缺失值标记。此外,数据涵盖1960年至2025年的长时序观测,指标数值范围极广,从负值到巨额正数,这种量级与类型的异质性对特征工程与模型稳健性提出了较高要求。数据本身可能存在报告误差、定义不一致或抽样偏差,这些固有局限性进一步增加了可靠分析的难度。
常用场景
经典使用场景
在非洲发展经济学与公共政策研究领域,该数据集为几内亚国家层面的多维度指标提供了结构化数据源。其经典使用场景在于支持时间序列分析与跨领域关联研究,例如通过整合经济、社会、环境、健康、教育等领域的指标,学者能够构建综合发展指数,评估几内亚在联合国可持续发展目标框架下的进展。数据集覆盖1960年至2025年的长期观测,使得研究者能够追踪城市化进程、能源消耗模式或公共卫生趋势的动态演变,为纵向比较和政策效果评估奠定数据基础。
实际应用
在实际应用层面,该数据集为国际组织、政府机构及非营利组织提供了决策支持工具。基于几内亚的指标数据,政策制定者能够模拟不同干预措施的社会经济效应,优化资源分配策略。例如,在公共卫生领域,结合健康指标与基础设施数据,可以评估医疗设施覆盖对疾病防控的影响;在能源规划中,利用历史消耗数据能够预测未来需求,指导可再生能源项目的部署。这些应用直接服务于几内亚的国家发展战略与人道主义援助项目的精准设计。
衍生相关工作
围绕该数据集衍生的经典工作主要集中在发展计量经济学与机器学习交叉领域。学者利用其构建了几内亚多维贫困指数模型,揭示了不平等与经济增长的复杂关联。在计算社会科学中,该数据被用于训练时间序列预测算法,以预估教育入学率或碳排放趋势。此外,结合地理信息系统,研究团队开发了空间可视化工具,将指标与区域特征关联,促进了地方性发展差距的分析。这些工作不仅拓展了数据集的学术价值,也为类似非洲国家的数据驱动研究提供了方法论借鉴。
以上内容由遇见数据集搜集并总结生成



