electricsheepafrica/africa-world-bank-infrastructure-indicators-for-somalia-fed-rep
收藏Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-world-bank-infrastructure-indicators-for-somalia-fed-rep
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- n<1K
source_datasets:
- original
task_categories:
- tabular-classification
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- facilities-infrastructure
- indicators
- som
pretty_name: "Somalia, Fed. Rep. - Infrastructure"
dataset_info:
splits:
- name: train
num_examples: 756
- name: test
num_examples: 189
---
# Somalia, Fed. Rep. - Infrastructure
**Publisher:** World Bank Group · **Source:** [HDX](https://data.humdata.org/dataset/world-bank-infrastructure-indicators-for-somalia-fed-rep) · **License:** `cc-by` · **Updated:** 2026-03-27
---
## Abstract
Contains data from the World Bank's [data portal](http://data.worldbank.org/). There is also a [consolidated country dataset](https://data.humdata.org/dataset/world-bank-combined-indicators-for-somalia-fed-rep) on HDX.
Infrastructure helps determine the success of manufacturing and agricultural activities. Investments in water, sanitation, energy, housing, and transport also improve lives and help reduce poverty. And new information and communication technologies promote growth, improve delivery of health and other services, expand the reach of education, and support social and cultural advances. Data here are compiled from such sources as the International Road Federation, Containerisation International, the International Civil Aviation Organization, the International Energy Association, and the International Telecommunications Union.
Each row in this dataset represents country-level aggregates. Data was last updated on HDX on 2026-03-27. Geographic scope: **SOM**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Public health |
| **Unit of observation** | Country-level aggregates |
| **Rows (total)** | 946 |
| **Columns** | 8 (2 numeric, 6 categorical, 0 datetime) |
| **Train split** | 756 rows |
| **Test split** | 189 rows |
| **Geographic scope** | SOM |
| **Publisher** | World Bank Group |
| **HDX last updated** | 2026-03-27 |
---
## Variables
**Geographic** — `country_name` (Somalia, Fed. Rep.), `country_iso3` (SOM), `year` (range 1960.0–2024.0).
**Outcome / Measurement** — `value` (range 0.0–442000000.0).
**Identifier / Metadata** — `indicator_name` (Renewable internal freshwater resources per capita (cubic meters), Renewable internal freshwater resources, total (billion cubic meters), Fixed telephone subscriptions), `indicator_code` (ER.H2O.INTR.PC, ER.H2O.INTR.K3, IT.MLT.MAIN), `esa_source` (HDX), `esa_processed` (2026-04-09).
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-world-bank-infrastructure-indicators-for-somalia-fed-rep")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `country_name` | object | 0.0% | Somalia, Fed. Rep. |
| `country_iso3` | object | 0.0% | SOM |
| `year` | int64 | 0.0% | 1960.0 – 2024.0 (mean 2000.0011) |
| `indicator_name` | object | 0.0% | Renewable internal freshwater resources per capita (cubic meters), Renewable internal freshwater resources, total (billion cubic meters), Fixed telephone subscriptions |
| `indicator_code` | object | 0.0% | ER.H2O.INTR.PC, ER.H2O.INTR.K3, IT.MLT.MAIN |
| `value` | float64 | 0.0% | 0.0 – 442000000.0 (mean 1440436.2511) |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-09 |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `year` | 1960.0 | 2024.0 | 2000.0011 | 2002.5 |
| `value` | 0.0 | 442000000.0 | 1440436.2511 | 6.0 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from World Bank Group and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/world-bank-infrastructure-indicators-for-somalia-fed-rep) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_world_bank_infrastructure_indicators_for_somalia_fed_rep,
title = {Somalia, Fed. Rep. - Infrastructure},
author = {World Bank Group},
year = {2026},
url = {https://data.humdata.org/dataset/world-bank-infrastructure-indicators-for-somalia-fed-rep},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
annotations_creators:
- 无注释(no-annotation)
language_creators:
- 公开获取(found)
language:
- 英语(en)
license: cc-by-4.0
multilinguality:
- 单语言(monolingual)
size_categories:
- 样本量少于1000(n<1K)
source_datasets:
- 原创数据集(original)
task_categories:
- 表格分类(tabular-classification)
task_ids:
- 无
tags:
- 非洲(africa)
- 人道主义(humanitarian)
- HDX
- Electric Sheep Africa(electric-sheep-africa)
- 设施与基础设施(facilities-infrastructure)
- 指标(indicators)
- SOM
pretty_name: "索马里联邦共和国——基础设施"
dataset_info:
splits:
- name: train
num_examples: 756
- name: test
num_examples: 189
# 索马里联邦共和国——基础设施
**发布方**:世界银行集团(World Bank Group) · **来源**:[HDX](https://data.humdata.org/dataset/world-bank-infrastructure-indicators-for-somalia-fed-rep) · **许可协议**:`cc-by` · **最后更新时间**:2026-03-27
---
## 摘要
本数据集的数据源自世界银行集团的[数据门户](http://data.worldbank.org/),HDX平台上另有一份[索马里整合型国家指标数据集](https://data.humdata.org/dataset/world-bank-combined-indicators-for-somalia-fed-rep)。
基础设施是制造业与农业活动成败的关键影响因素。在供水、卫生、能源、住房与交通领域的投资,不仅能够改善民众生活,更有助于减缓贫困。而新兴信息与通信技术则可推动经济增长、优化医疗及其他公共服务的交付效率、拓展教育覆盖范围,并助力社会与文化进步。本数据集的数据整合自多个权威来源,包括国际道路联合会(International Road Federation)、国际集装箱化协会(Containerisation International)、国际民用航空组织(International Civil Aviation Organization)、国际能源署(International Energy Association)以及国际电信联盟(International Telecommunications Union)。
本数据集的每一行均代表国家级聚合数据。数据在HDX平台的最后更新时间为2026-03-27,地理覆盖范围:**SOM**。
*本数据集已由[Electric Sheep Africa](https://huggingface.co/electricsheepafrica)整理为适用于机器学习的Parquet(Parquet)格式。*
---
## 数据集特征
| | |
|---|---|
| **领域** | 公共卫生(Public health) |
| **观测单元** | 国家级聚合数据 |
| **总样本行数** | 946 |
| **列数** | 8列(2列数值型,6列分类型,0列日期型) |
| **训练集拆分** | 756行 |
| **测试集拆分** | 189行 |
| **地理覆盖范围** | SOM |
| **发布方** | 世界银行集团(World Bank Group) |
| **HDX平台最后更新时间** | 2026-03-27 |
---
## 变量说明
### 变量分类
1. **地理类变量**:`country_name`(国家名称:索马里联邦共和国)、`country_iso3`(国家ISO3代码:SOM)、`year`(年份范围:1960.0–2024.0)。
2. **结果/测量类变量**:`value`(指标数值,取值范围:0.0–442000000.0)。
3. **标识符/元数据类变量**:`indicator_name`(指标名称:人均可再生内陆淡水资源(立方米)、可再生内陆淡水总储量(十亿立方米)、固定电话订阅量)、`indicator_code`(指标代码:ER.H2O.INTR.PC、ER.H2O.INTR.K3、IT.MLT.MAIN)、`esa_source`(数据来源:HDX)、`esa_processed`(数据处理时间:2026-04-09)。
---
## 快速入门
python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-world-bank-infrastructure-indicators-for-somalia-fed-rep")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
---
## 数据结构
| 列名 | 数据类型 | 空值占比 | 取值范围/示例值 |
|---|---|---|---|
| `country_name` | 字符型(object) | 0.0% | 索马里联邦共和国 |
| `country_iso3` | 字符型(object) | 0.0% | SOM |
| `year` | 64位整型(int64) | 0.0% | 1960.0 – 2024.0(均值:2000.0011) |
| `indicator_name` | 字符型(object) | 0.0% | 人均可再生内陆淡水资源(立方米)、可再生内陆淡水总储量(十亿立方米)、固定电话订阅量 |
| `indicator_code` | 字符型(object) | 0.0% | ER.H2O.INTR.PC、ER.H2O.INTR.K3、IT.MLT.MAIN |
| `value` | 64位浮点型(float64) | 0.0% | 0.0 – 442000000.0(均值:1440436.2511) |
| `esa_source` | 字符型(object) | 0.0% | HDX |
| `esa_processed` | 字符型(object) | 0.0% | 2026-04-09 |
---
## 数值型变量统计摘要
| 列名 | 最小值 | 最大值 | 均值 | 中位数 |
|---|---|---|---|---|
| `year` | 1960.0 | 2024.0 | 2000.0011 | 2002.5 |
| `value` | 0.0 | 442000000.0 | 1440436.2511 | 6.0 |
---
## 数据整理流程
原始数据通过CKAN API从HDX平台下载,并转换为Parquet(Parquet)格式。所有列名均转为小写并标准化为蛇形命名法(snake_case)。常见的缺失值标记(`N/A`、`null`、`none`、`-`、`unknown`、`no data`、`#N/A`)被统一替换为`NaN`。本数据集以固定随机种子(42)按80/20的比例划分为训练集与测试集,并以Snappy压缩的Parquet格式存储。
---
## 数据集局限性
1. 本数据集的数据源自世界银行集团,Electric Sheep Africa(ESA)未对其进行独立验证。
2. 自动化数据清洗流程无法修正原始数据集中的错报值、定义不一致或采样偏差问题。
3. 如需了解发布方的方法论说明与免责条款,请参阅[HDX平台原始数据集页面](https://data.humdata.org/dataset/world-bank-infrastructure-indicators-for-somalia-fed-rep)。
---
## 引用格式
bibtex
@dataset{hdx_africa_world_bank_infrastructure_indicators_for_somalia_fed_rep,
title = {索马里联邦共和国——基础设施},
author = {World Bank Group},
year = {2026},
url = {https://data.humdata.org/dataset/world-bank-infrastructure-indicators-for-somalia-fed-rep},
note = {由Electric Sheep Africa(https://huggingface.co/electricsheepafrica)重新打包为机器学习可用数据集}
}
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — 非洲机器学习数据集基础设施提供商,尼日利亚拉各斯。*
提供机构:
electricsheepafrica



