electricsheepafrica/africa-world-bank-social-development-indicators-for-south-sudan
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-world-bank-social-development-indicators-for-south-sudan
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- n<1K
source_datasets:
- original
task_categories:
- tabular-classification
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- development
- indicators
- ssd
pretty_name: "South Sudan - Social Development"
dataset_info:
splits:
- name: train
num_examples: 556
- name: test
num_examples: 139
---
# South Sudan - Social Development
**Publisher:** World Bank Group · **Source:** [HDX](https://data.humdata.org/dataset/world-bank-social-development-indicators-for-south-sudan) · **License:** `cc-by` · **Updated:** 2026-03-27
---
## Abstract
Contains data from the World Bank's [data portal](http://data.worldbank.org/). There is also a [consolidated country dataset](https://data.humdata.org/dataset/world-bank-combined-indicators-for-south-sudan) on HDX.
Data here cover child labor, gender issues, refugees, and asylum seekers. Children in many countries work long hours, often combining studying with work for pay. The data on their paid work are from household surveys conducted by the International Labour Organization (ILO), the United Nations Children's Fund (UNICEF), the World Bank, and national statistical offices. Gender disparities are measured using a compilation of data on key topics such as education, health, labor force participation, and political participation. Data on refugees are from the United Nations High Commissioner for Refugees complemented by statistics on Palestinian refugees under the mandate of the United Nations Relief and Works Agency.
Each row in this dataset represents country-level aggregates. Data was last updated on HDX on 2026-03-27. Geographic scope: **SSD**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Public health |
| **Unit of observation** | Country-level aggregates |
| **Rows (total)** | 696 |
| **Columns** | 8 (2 numeric, 6 categorical, 0 datetime) |
| **Train split** | 556 rows |
| **Test split** | 139 rows |
| **Geographic scope** | SSD |
| **Publisher** | World Bank Group |
| **HDX last updated** | 2026-03-27 |
---
## Variables
**Geographic** — `country_name` (South Sudan), `country_iso3` (SSD), `year` (range 1960.0–2024.0).
**Outcome / Measurement** — `value` (range 0.3–159.415).
**Identifier / Metadata** — `indicator_name` (Life expectancy at birth, male (years), Life expectancy at birth, female (years), Adolescent fertility rate (births per 1,000 women ages 15-19)), `indicator_code` (SP.DYN.LE00.MA.IN, SP.DYN.LE00.FE.IN, SP.ADO.TFRT), `esa_source` (HDX), `esa_processed` (2026-04-10).
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-world-bank-social-development-indicators-for-south-sudan")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `country_name` | object | 0.0% | South Sudan |
| `country_iso3` | object | 0.0% | SSD |
| `year` | int64 | 0.0% | 1960.0 – 2024.0 (mean 2002.806) |
| `indicator_name` | object | 0.0% | Life expectancy at birth, male (years), Life expectancy at birth, female (years), Adolescent fertility rate (births per 1,000 women ages 15-19) |
| `indicator_code` | object | 0.0% | SP.DYN.LE00.MA.IN, SP.DYN.LE00.FE.IN, SP.ADO.TFRT |
| `value` | float64 | 0.0% | 0.3 – 159.415 (mean 58.2814) |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-10 |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `year` | 1960.0 | 2024.0 | 2002.806 | 2005.0 |
| `value` | 0.3 | 159.415 | 58.2814 | 66.411 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from World Bank Group and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/world-bank-social-development-indicators-for-south-sudan) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_world_bank_social_development_indicators_for_south_sudan,
title = {South Sudan - Social Development},
author = {World Bank Group},
year = {2026},
url = {https://data.humdata.org/dataset/world-bank-social-development-indicators-for-south-sudan},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica
搜集汇总
数据集介绍

构建方式
在社会科学与公共健康领域,数据集的构建往往依赖于权威机构的系统性收集与整合。本数据集源自世界银行集团的数据门户,通过人道主义数据交换平台获取原始资料,涵盖了儿童劳动、性别议题、难民及寻求庇护者等多维社会指标。数据采集过程融合了国际劳工组织、联合国儿童基金会及各国统计部门的家庭调查成果,确保了信息的广泛性与代表性。随后,Electric Sheep Africa团队对原始数据进行了标准化处理,包括字段命名规范化、缺失值统一标记,并采用固定随机种子将数据划分为训练集与测试集,最终以Parquet格式封装,为机器学习应用提供了结构化的数据基础。
特点
该数据集聚焦于南苏丹的社会发展指标,呈现出鲜明的领域专属性与时空纵深。其核心特征在于以国家层面的聚合数据为单位,涵盖了从1960年至2024年的长期观测序列,囊括了男性与女性出生时预期寿命、青少年生育率等关键健康指标。数据集结构简洁而清晰,包含8个字段,其中数值型与分类型变量分布均衡,且无缺失值,确保了数据的完整性与一致性。此外,数据集已预先划分为训练与测试子集,便于直接应用于模型开发与验证,为研究者提供了即用型的分析资源。
使用方法
在机器学习与数据科学的应用场景中,本数据集为探索社会发展趋势与构建预测模型提供了便利。用户可通过Hugging Face的datasets库直接加载数据,并转换为Pandas DataFrame以进行后续分析。数据集已预设训练与测试分割,支持用户快速开展特征工程、统计建模或时间序列分析。鉴于其表格型结构与明确的指标编码,研究者可围绕特定社会议题,如健康不平等或人口动态,进行跨年份的比较研究或构建回归模型。同时,建议参考原始数据源的方法论说明,以深入理解指标定义与潜在局限,确保分析结论的稳健性。
背景与挑战
背景概述
在全球化发展议程与非洲区域研究不断深化的背景下,世界银行集团联合国际劳工组织、联合国儿童基金会等机构,系统性地收集并发布了涵盖儿童劳动、性别平等、难民状况等关键议题的社会发展指标数据。该数据集由Electric Sheep Africa于2026年进行专业化整理与机器学习适配,聚焦南苏丹这一战后重建国家,旨在通过国家层面的聚合数据,为公共卫生、社会政策及发展经济学领域提供历时性观测基准。其核心研究问题在于量化评估脆弱国家的社会进步轨迹,并为国际比较与发展干预成效分析提供实证基础,对理解冲突后社会的韧性建设具有重要参考价值。
当前挑战
该数据集致力于解决社会经济发展指标的多维度量与预测挑战,尤其在资源匮乏、数据收集体系薄弱的地区,如何准确捕捉儿童劳动、青少年生育率及性别差距等敏感议题的动态变化构成核心难题。在构建过程中,数据整合面临原始调查方法不一、统计口径跨时不一致以及缺失值处理复杂等挑战;同时,自动化清洗流程难以修正源数据中可能存在的报告偏差或定义不一致问题,这要求使用者必须审慎结合原始方法论说明进行解读,以避免模型推断产生系统性误差。
常用场景
经典使用场景
在社会科学与公共卫生领域,该数据集为研究南苏丹的社会发展轨迹提供了关键数据支撑。学者们常利用其时间序列指标,如出生时预期寿命和青少年生育率,构建回归模型或时间序列分析,以揭示该国在性别差异、健康变迁等方面的长期趋势。这些分析不仅描绘了南苏丹社会发展的宏观图景,还为理解冲突后国家的社会韧性提供了量化依据。
实际应用
在实际应用中,该数据集被国际组织、非政府机构和政策制定者广泛用于监测南苏丹的社会发展进程。例如,联合国机构可依据青少年生育率数据设计针对性的公共卫生项目,而人道主义救援组织则能参考预期寿命指标评估医疗援助的成效。这些数据支持了资源分配决策,助力于可持续发展目标的本地化实施。
衍生相关工作
围绕该数据集,已衍生出多项经典研究工作,包括利用机器学习方法预测南苏丹的健康与社会指标趋势,以及结合地理空间数据进行多维贫困分析。此外,学者们将其与其他非洲国家数据集整合,开展了比较发展研究,探索冲突、气候与社会韧性之间的复杂关联,丰富了发展学领域的学术成果。
以上内容由遇见数据集搜集并总结生成



