electricsheepafrica/africa-world-bank-public-sector-indicators-for-south-africa
收藏Hugging Face2026-04-11 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-world-bank-public-sector-indicators-for-south-africa
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- tabular-classification
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- economics
- indicators
- zaf
pretty_name: "South Africa - Public Sector"
dataset_info:
splits:
- name: train
num_examples: 2924
- name: test
num_examples: 731
---
# South Africa - Public Sector
**Publisher:** World Bank Group · **Source:** [HDX](https://data.humdata.org/dataset/world-bank-public-sector-indicators-for-south-africa) · **License:** `cc-by` · **Updated:** 2026-03-27
---
## Abstract
Contains data from the World Bank's [data portal](http://data.worldbank.org/). There is also a [consolidated country dataset](https://data.humdata.org/dataset/world-bank-combined-indicators-for-south-africa) on HDX.
Effective governments improve people's standard of living by ensuring access to essential services – health, education, water and sanitation, electricity, transport – and the opportunity to live and work in peace and security. Data here includes World Bank staff assessments of country performance in economic management, structural policies, policies for social inclusion and equity, and public sector management and institutions for the poorest countries. Also included are indicators on revenues and expenses from the International Monetary Fund's Government Finance Statistics, and on tax policies from various sources.
Each row in this dataset represents country-level aggregates. Data was last updated on HDX on 2026-03-27. Geographic scope: **ZAF**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Public health |
| **Unit of observation** | Country-level aggregates |
| **Rows (total)** | 3,656 |
| **Columns** | 8 (2 numeric, 6 categorical, 0 datetime) |
| **Train split** | 2,924 rows |
| **Test split** | 731 rows |
| **Geographic scope** | ZAF |
| **Publisher** | World Bank Group |
| **HDX last updated** | 2026-03-27 |
---
## Variables
**Geographic** — `country_name` (South Africa), `country_iso3` (ZAF), `year` (range 1960.0–2024.0).
**Outcome / Measurement** — `value` (range -552195678400.0–5591214716000.0).
**Identifier / Metadata** — `indicator_name` (Military expenditure (current USD), Military expenditure (% of GDP), Military expenditure (current LCU)), `indicator_code` (MS.MIL.XPND.CD, MS.MIL.XPND.GD.ZS, MS.MIL.XPND.CN), `esa_source` (HDX), `esa_processed` (2026-04-11).
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-world-bank-public-sector-indicators-for-south-africa")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `country_name` | object | 0.0% | South Africa |
| `country_iso3` | object | 0.0% | ZAF |
| `year` | int64 | 0.0% | 1960.0 – 2024.0 (mean 2002.1529) |
| `indicator_name` | object | 0.0% | Military expenditure (current USD), Military expenditure (% of GDP), Military expenditure (current LCU) |
| `indicator_code` | object | 0.0% | MS.MIL.XPND.CD, MS.MIL.XPND.GD.ZS, MS.MIL.XPND.CN |
| `value` | float64 | 0.0% | -552195678400.0 – 5591214716000.0 (mean 47342415114.2381) |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-11 |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `year` | 1960.0 | 2024.0 | 2002.1529 | 2005.0 |
| `value` | -552195678400.0 | 5591214716000.0 | 47342415114.2381 | 45.3868 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from World Bank Group and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/world-bank-public-sector-indicators-for-south-africa) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_world_bank_public_sector_indicators_for_south_africa,
title = {South Africa - Public Sector},
author = {World Bank Group},
year = {2026},
url = {https://data.humdata.org/dataset/world-bank-public-sector-indicators-for-south-africa},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica
搜集汇总
数据集介绍

构建方式
在公共部门治理研究领域,数据质量直接影响政策分析的深度与广度。本数据集源自世界银行集团发布的公开数据,由Electric Sheep Africa团队通过技术流程进行系统化整理。原始数据从人道主义数据交换平台(HDX)经由CKAN API获取,经过标准化清洗,统一了缺失值标记并转换为蛇形命名法。随后,数据被划分为训练集与测试集,采用固定随机种子确保可复现性,最终以Snappy压缩的Parquet格式存储,为机器学习应用提供结构化基础。
使用方法
利用该数据集进行实证研究时,用户可通过Hugging Face的datasets库便捷加载,并转换为Pandas DataFrame以进行后续分析。数据集适用于表格分类等机器学习任务,研究者可依据年份、指标代码等维度探索公共支出模式及其与经济变量的关联。鉴于数据来源于权威国际机构,建议结合世界银行的方法论说明进行解读,并注意原始数据可能存在的定义不一致性,以确保分析结论的稳健性。
背景与挑战
背景概述
在公共管理与经济发展研究领域,量化评估国家公共部门效能是理解政府治理与政策影响的关键。世界银行集团发布的南非公共部门指标数据集,由Electric Sheep Africa于2026年重新整理并转化为机器学习可用格式,聚焦于南非自1960年至2024年的宏观经济与公共财政数据。该数据集的核心研究问题在于通过军事支出等具体指标,系统衡量南非在公共部门管理、经济结构政策及社会包容性方面的表现,为政策分析、发展经济学研究及跨国比较提供了标准化数据基础,对非洲区域研究及全球治理评估具有重要参考价值。
当前挑战
该数据集旨在解决公共部门效能评估与政策影响量化这一领域问题,其挑战在于如何从多维指标中准确捕捉政府治理的复杂动态,并克服跨国数据定义不一致、时间序列断裂以及指标代表性有限等固有难题。在构建过程中,数据整合面临原始数据来源的异构性、缺失值标记的多样性以及数值范围极端离散等挑战,自动化清洗流程虽能统一格式,却难以修正原始收集可能存在的报告偏差或方法论差异,这要求使用者必须结合领域知识审慎解读数据内在局限性。
常用场景
经典使用场景
在公共经济学与发展研究领域,该数据集为分析南非公共部门绩效提供了结构化时序数据。其经典使用场景聚焦于利用机器学习方法,如表格分类模型,对历年军事支出等关键指标进行趋势预测与异常检测。研究者通过划分训练集与测试集,能够构建预测框架,评估政府财政政策的连续性与稳定性,从而揭示宏观经济管理的内在规律。
解决学术问题
该数据集有效解决了发展经济学中关于公共资源分配效率的量化研究难题。通过整合世界银行与国际货币基金组织的标准化指标,它使得学者能够系统评估南非在经济增长、社会包容及公共治理方面的长期表现。其意义在于为跨国比较和政策效应分析提供了可靠的经验证据,推动了基于数据的公共政策评估范式的演进。
实际应用
在实际应用中,该数据集被政府机构与国际组织用于监测南非的财政可持续性与国防预算透明度。分析师可依据历史数据模拟不同政策情境下的财政影响,辅助制定中长期发展战略。同时,非政府组织借助这些指标评估公共服务覆盖水平,为民生改善项目提供定向支持,增强社会干预的精准性与时效性。
数据集最近研究
最新研究方向
在公共部门管理与经济政策分析领域,该数据集聚焦于南非的军事支出指标,为研究国防经济学与财政资源配置提供了结构化时序数据。前沿研究正利用此类数据探索机器学习模型在预测政府支出趋势、评估财政可持续性以及分析安全政策对宏观经济影响方面的应用。结合全球地缘政治热点,学者们关注军事开支与经济发展、社会福祉之间的动态关联,旨在通过数据驱动的方法揭示公共部门效率与治理质量。这些研究不仅深化了对新兴市场国家政策效力的理解,也为国际组织制定针对性援助策略提供了实证依据。
以上内容由遇见数据集搜集并总结生成



