electricsheepafrica/africa-world-bank-infrastructure-indicators-for-south-sudan
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-world-bank-infrastructure-indicators-for-south-sudan
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- n<1K
source_datasets:
- original
task_categories:
- tabular-classification
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- facilities-infrastructure
- indicators
- ssd
pretty_name: "South Sudan - Infrastructure"
dataset_info:
splits:
- name: train
num_examples: 219
- name: test
num_examples: 54
---
# South Sudan - Infrastructure
**Publisher:** World Bank Group · **Source:** [HDX](https://data.humdata.org/dataset/world-bank-infrastructure-indicators-for-south-sudan) · **License:** `cc-by` · **Updated:** 2026-03-27
---
## Abstract
Contains data from the World Bank's [data portal](http://data.worldbank.org/). There is also a [consolidated country dataset](https://data.humdata.org/dataset/world-bank-combined-indicators-for-south-sudan) on HDX.
Infrastructure helps determine the success of manufacturing and agricultural activities. Investments in water, sanitation, energy, housing, and transport also improve lives and help reduce poverty. And new information and communication technologies promote growth, improve delivery of health and other services, expand the reach of education, and support social and cultural advances. Data here are compiled from such sources as the International Road Federation, Containerisation International, the International Civil Aviation Organization, the International Energy Association, and the International Telecommunications Union.
Each row in this dataset represents country-level aggregates. Data was last updated on HDX on 2026-03-27. Geographic scope: **SSD**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Public health |
| **Unit of observation** | Country-level aggregates |
| **Rows (total)** | 274 |
| **Columns** | 8 (2 numeric, 6 categorical, 0 datetime) |
| **Train split** | 219 rows |
| **Test split** | 54 rows |
| **Geographic scope** | SSD |
| **Publisher** | World Bank Group |
| **HDX last updated** | 2026-03-27 |
---
## Variables
**Geographic** — `country_name` (South Sudan), `country_iso3` (SSD), `year` (range 2010.0–2024.0).
**Outcome / Measurement** — `value` (range 0.0–65396700.0).
**Identifier / Metadata** — `indicator_name` (Mobile cellular subscriptions, Fixed broadband subscriptions (per 100 people), Fixed broadband subscriptions), `indicator_code` (IT.CEL.SETS, IT.NET.BBND.P2, IT.NET.BBND), `esa_source` (HDX), `esa_processed` (2026-04-10).
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-world-bank-infrastructure-indicators-for-south-sudan")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `country_name` | object | 0.0% | South Sudan |
| `country_iso3` | object | 0.0% | SSD |
| `year` | int64 | 0.0% | 2010.0 – 2024.0 (mean 2016.8942) |
| `indicator_name` | object | 0.0% | Mobile cellular subscriptions, Fixed broadband subscriptions (per 100 people), Fixed broadband subscriptions |
| `indicator_code` | object | 0.0% | IT.CEL.SETS, IT.NET.BBND.P2, IT.NET.BBND |
| `value` | float64 | 0.0% | 0.0 – 65396700.0 (mean 728831.2783) |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-10 |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `year` | 2010.0 | 2024.0 | 2016.8942 | 2017.0 |
| `value` | 0.0 | 65396700.0 | 728831.2783 | 9.0245 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from World Bank Group and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/world-bank-infrastructure-indicators-for-south-sudan) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_world_bank_infrastructure_indicators_for_south_sudan,
title = {South Sudan - Infrastructure},
author = {World Bank Group},
year = {2026},
url = {https://data.humdata.org/dataset/world-bank-infrastructure-indicators-for-south-sudan},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica
搜集汇总
数据集介绍

构建方式
在基础设施发展研究领域,数据质量直接影响分析深度。该数据集由世界银行集团发布,原始数据通过人道主义数据交换平台获取,涵盖了南苏丹2010年至2024年的国家层面基础设施指标。Electric Sheep Africa团队运用自动化流程进行数据清洗与转换,通过CKAN API下载原始资料后,将列名统一为蛇形命名法,并将各类缺失值标记规范化为NaN值。最终采用固定随机种子将数据按80:20比例划分为训练集与测试集,并以Snappy压缩的Parquet格式存储,确保了数据的机器学习就绪性。
使用方法
为支持机器学习与数据分析任务,该数据集已预分割为训练集与测试集。研究者可通过Hugging Face的datasets库直接加载,利用Python环境快速转换为Pandas DataFrame进行探索。典型工作流程包括加载数据后,依据年份或指标代码进行筛选,分析基础设施指标随时间的变化趋势。由于数据已清洗并标准化,用户可专注于模型构建或统计推断,但需注意原始数据可能存在报告偏差,建议参考世界银行的方法论说明以理解指标定义与收集局限。
背景与挑战
背景概述
基础设施发展是衡量一个国家经济与社会进步的关键维度,尤其在非洲地区,可靠的数据对于政策制定与学术研究至关重要。由世界银行集团于2026年发布,并由Electric Sheep Africa机构重新整理为机器学习可用格式的‘南苏丹基础设施指标数据集’,聚焦于南苏丹这一新兴国家在2010年至2024年间的基础设施状况。该数据集的核心研究问题在于量化并分析南苏丹在移动通信、固定宽带等关键信息与通信技术领域的进展,旨在为发展经济学、公共政策及人道主义干预提供实证基础,其结构化数据有助于揭示基础设施投资与减贫、经济增长之间的内在联系。
当前挑战
该数据集旨在解决基础设施指标分析与预测的领域挑战,具体涉及如何从有限的国家级聚合数据中准确推断基础设施发展对经济社会的影响,以及如何在数据稀疏背景下构建稳健的机器学习模型。在构建过程中,挑战主要源于原始数据的固有局限性,包括世界银行数据可能存在报告不一致、定义差异及采样偏差,而自动化清洗流程难以纠正这些深层问题;此外,数据集规模较小(仅274行),且时间跨度有限,可能制约模型泛化能力与长期趋势分析的可靠性。
常用场景
经典使用场景
在基础设施与经济发展研究领域,该数据集为分析南苏丹的通信基础设施发展轨迹提供了关键数据支撑。研究者通常利用其时间序列指标,如移动蜂窝订阅和固定宽带订阅数量,构建回归模型或趋势分析框架,以评估基础设施投资对区域经济增长的潜在影响。这类应用常见于发展经济学或公共政策分析中,旨在揭示基础设施指标与宏观社会经济变量之间的动态关联。
解决学术问题
该数据集有效解决了发展研究中关于基础设施量化评估的若干核心问题,特别是针对脆弱国家或战后重建地区的实证数据缺失挑战。通过提供标准化的国家层面聚合指标,它支持学者检验基础设施可及性与减贫、教育普及或卫生服务改善之间的因果关系。其意义在于为跨学科研究提供了可复现的基准数据,促进了基于证据的政策评估方法在非洲区域研究中的广泛应用。
实际应用
在实际应用层面,该数据集被国际组织、非政府机构及政府规划部门用于监测南苏丹的基础设施覆盖进展。例如,在制定区域通信网络扩展战略时,决策者可依据历史订阅数据预测需求缺口,优化资源分配。同时,人道主义援助机构也能借助这些指标评估灾后恢复项目的成效,确保基础设施投资精准对接社区实际需要。
数据集最近研究
最新研究方向
在非洲基础设施发展研究领域,南苏丹作为新兴国家,其基础设施指标数据集正成为评估通信技术普及与数字鸿沟的关键资源。当前前沿研究聚焦于利用机器学习模型分析移动蜂窝与固定宽带订阅数据,以预测区域连通性趋势及评估可持续发展目标进展。伴随全球数字转型浪潮,该数据集关联的热点事件包括国际社会对非洲数字基础设施的投资倡议,其影响在于为政策制定者提供实证依据,助力缩小区域发展差距,推动包容性经济增长。
以上内容由遇见数据集搜集并总结生成



