electricsheepafrica/africa-world-bank-public-sector-indicators-for-eswatini
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-world-bank-public-sector-indicators-for-eswatini
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- tabular-classification
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- economics
- indicators
- swz
pretty_name: "Eswatini - Public Sector"
dataset_info:
splits:
- name: train
num_examples: 1744
- name: test
num_examples: 436
---
# Eswatini - Public Sector
**Publisher:** World Bank Group · **Source:** [HDX](https://data.humdata.org/dataset/world-bank-public-sector-indicators-for-eswatini) · **License:** `cc-by` · **Updated:** 2026-03-27
---
## Abstract
Contains data from the World Bank's [data portal](http://data.worldbank.org/). There is also a [consolidated country dataset](https://data.humdata.org/dataset/world-bank-combined-indicators-for-eswatini) on HDX.
Effective governments improve people's standard of living by ensuring access to essential services – health, education, water and sanitation, electricity, transport – and the opportunity to live and work in peace and security. Data here includes World Bank staff assessments of country performance in economic management, structural policies, policies for social inclusion and equity, and public sector management and institutions for the poorest countries. Also included are indicators on revenues and expenses from the International Monetary Fund's Government Finance Statistics, and on tax policies from various sources.
Each row in this dataset represents country-level aggregates. Data was last updated on HDX on 2026-03-27. Geographic scope: **SWZ**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Public health |
| **Unit of observation** | Country-level aggregates |
| **Rows (total)** | 2,181 |
| **Columns** | 8 (2 numeric, 6 categorical, 0 datetime) |
| **Train split** | 1,744 rows |
| **Test split** | 436 rows |
| **Geographic scope** | SWZ |
| **Publisher** | World Bank Group |
| **HDX last updated** | 2026-03-27 |
---
## Variables
**Geographic** — `country_name` (Eswatini), `country_iso3` (SWZ), `year` (range 1977.0–2024.0).
**Outcome / Measurement** — `value` (range -4856337980.0–26122100000.0).
**Identifier / Metadata** — `indicator_name` (Military expenditure (current LCU), Military expenditure (% of GDP), Military expenditure (current USD)), `indicator_code` (MS.MIL.XPND.CN, MS.MIL.XPND.GD.ZS, MS.MIL.XPND.CD), `esa_source` (HDX), `esa_processed` (2026-04-10).
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-world-bank-public-sector-indicators-for-eswatini")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `country_name` | object | 0.0% | Eswatini |
| `country_iso3` | object | 0.0% | SWZ |
| `year` | int64 | 0.0% | 1977.0 – 2024.0 (mean 2009.9743) |
| `indicator_name` | object | 0.0% | Military expenditure (current LCU), Military expenditure (% of GDP), Military expenditure (current USD) |
| `indicator_code` | object | 0.0% | MS.MIL.XPND.CN, MS.MIL.XPND.GD.ZS, MS.MIL.XPND.CD |
| `value` | float64 | 0.0% | -4856337980.0 – 26122100000.0 (mean 591530690.794) |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-10 |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `year` | 1977.0 | 2024.0 | 2009.9743 | 2010.0 |
| `value` | -4856337980.0 | 26122100000.0 | 591530690.794 | 20.8592 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from World Bank Group and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/world-bank-public-sector-indicators-for-eswatini) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_world_bank_public_sector_indicators_for_eswatini,
title = {Eswatini - Public Sector},
author = {World Bank Group},
year = {2026},
url = {https://data.humdata.org/dataset/world-bank-public-sector-indicators-for-eswatini},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica
搜集汇总
数据集介绍

构建方式
该数据集源自世界银行集团于人道主义数据交换平台(HDX)发布的公共部门指标,专为非洲国家斯威士兰(Eswatini,代码SWZ)整理而成。原始数据通过CKAN API从HDX获取,随后由Electric Sheep Africa团队进行深度清洗与标准化处理。具体流程包括:将列名统一转换为小写蛇形命名法(snake_case),把常见的缺失值标记(如N/A、null、none等)统一填补为NaN,并最终按照80/20的比例、借助固定随机种子(42)划分为训练集(1,744条样本)与测试集(436条样本),以Snappy压缩的Parquet格式存储,确保数据加载高效且兼容主流机器学习框架。
特点
数据集共包含2,181条记录,涵盖1977年至2024年间斯威士兰公共部门的关键经济与治理指标,包括军费开支(当前本币计、占GDP百分比及当前美元计)三类度量。其核心特色在于整合了世界银行对国家绩效的评估数据(涉及经济管理、结构性政策、社会包容与公平政策以及公共部门制度)与国际货币基金组织的政府财政统计数据,横跨数十年时间跨度,为分析该国土耳其公共治理效能与经济发展轨迹提供了丰富的时间序列维度。数据集以国家层级聚合为观测单元,包含8个字段(2个数值型,6个类别型),缺失值率极低,整体质量较高。
使用方法
用户可通过Hugging Face的`datasets`库便捷加载该数据集,仅需调用`load_dataset("electricsheepafrica/africa-world-bank-public-sector-indicators-for-eswatini")`即可获取训练与测试子集,并利用`to_pandas()`方法转换为Pandas DataFrame以进行后续分析。数据适用于表格分类任务,研究者可基于`year`、`indicator_name`等特征对`value`目标变量建模,或用于公共财政趋势预测、政策影响评估等应用。需注意,原始数据源自世界银行,未经Electric Sheep Africa独立验证,使用者应参照HDX原始页面的方法说明以规避潜在的报告偏差与定义不一致问题。
背景与挑战
背景概述
该数据集由世界银行集团于2026年发布,经由人道主义数据交换(HDX)平台提供,并由Electric Sheep Africa团队整理为机器学习就绪的Parquet格式。核心研究问题聚焦于评估斯威士兰(Eswatini)的公共部门绩效,涵盖经济管理、结构性政策、社会包容与公平政策以及公共部门管理等关键指标。数据集收录了从1977年至2024年间关于军事支出、税收政策及政府财政统计等方面的国别级聚合数据,为分析有效政府如何通过改善基本服务获取(如健康、教育、水电及交通)来提升人民生活水平提供了量化基础。作为聚焦撒哈拉以南非洲地区的专项数据集,它在公共财政、发展经济学及政策评估领域具有重要的参考价值,尤其为资源匮乏国家的治理效能比较研究奠定了数据基石。
当前挑战
该数据集所应对的领域挑战在于,公共部门指标的量化分析长期受限于数据碎片化、定义不一致及时间序列不完整等问题,难以系统评估治理绩效对民生改善的真实影响。具体而言,斯威士兰作为相对小型的经济体,其财政与治理数据常被纳入综合数据库而难以单独提取,复杂的变量命名和跨来源的统计口径差异加大了研究门槛。在构建过程中,挑战包括从CKAN接口下载原始数据后需统一缺失值标记(如N/A、null等),并需将列名标准化为snake_case格式;此外,数值型指标(如军用支出)在长达47年的时间跨度内存在极端值(最低-48.56亿,最高261.22亿),异常检测与数据清洗需结合领域知识,而原始世界银行方法论中的潜在偏差与未经验证的数据限制,进一步要求模型设计者对可靠性保持审慎态度。
常用场景
经典使用场景
该数据集以斯威士兰为地理单元,汇集了世界银行关于公共部门绩效的长期面板数据,涵盖经济管理、结构性政策、社会包容与公平以及公共部门制度等维度的评估指标。研究者可借助这些跨年度的时间序列数据,系统分析斯威士兰公共治理水平与财政收支结构的演变轨迹,并构建回归模型探究政府支出效率、军事开支占比等关键变量对经济发展与社会福祉的因果效应。
解决学术问题
在学术研究中,该数据集为分析低收入国家公共部门效能与可持续发展目标之间的关系提供了可靠的数据支撑。它解决了发展中国家治理指标长期缺失细粒度、可比性数据的困境,使得学者能够量化评估结构性改革政策对经济管理质量的真实影响,以及公共财政透明度在促进社会包容性中的作用,从而填补了非洲小国国别治理实证研究的空白。
衍生相关工作
基于该数据集,Electric Sheep Africa 在HuggingFace上将其重新打包为机器学习友好的Parquet格式,并预设了80/20的训练与测试划分,直接服务于治理指标预测与脆弱性分类任务。相关工作衍生出针对低资源国家公共财政数据的标准化清洗流水线,以及利用时间序列模型预测政府支出模式的研究,此外还促进了以非洲国家为样本的政府效能跨区域比较分析框架的发展。
以上内容由遇见数据集搜集并总结生成



