electricsheepafrica/africa-world-bank-trade-indicators-for-federal-republic-of-somalia
收藏Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-world-bank-trade-indicators-for-federal-republic-of-somalia
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- tabular-classification
- tabular-regression
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- hxl
- indicators
- trade
- som
pretty_name: "Federal Republic of Somalia - Trade"
dataset_info:
splits:
- name: train
num_examples: 1880
- name: test
num_examples: 470
---
# Federal Republic of Somalia - Trade
**Publisher:** World Bank Group · **Source:** [HDX](https://data.humdata.org/dataset/world-bank-trade-indicators-for-federal-republic-of-somalia) · **License:** `cc-by` · **Updated:** 2025-11-04
---
## Abstract
Contains data from the World Bank's [data portal](http://data.worldbank.org/). There is also a [consolidated country dataset](https://data.humdata.org/dataset/world-bank-combined-indicators-for-federal-republic-of-somalia) on HDX.
Trade is a key means to fight poverty and achieve the Millennium Development Goals, specifically by improving developing country access to markets, and supporting a rules based, predictable trading system. In cooperation with other international development partners, the World Bank launched the Transparency in Trade Initiative to provide free and easy access to data on country-specific trade policies.
Each row in this dataset represents country-level aggregates. Data was last updated on HDX on 2025-11-04. Geographic scope: **SOM**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Poverty and economic vulnerability |
| **Unit of observation** | Country-level aggregates |
| **Rows (total)** | 2,351 |
| **Columns** | 8 (2 numeric, 6 categorical, 0 datetime) |
| **Train split** | 1,880 rows |
| **Test split** | 470 rows |
| **Geographic scope** | SOM |
| **Publisher** | World Bank Group |
| **HDX last updated** | 2025-11-04 |
---
## Variables
**Geographic** — `country_name` (Federal Republic of Somalia, #country+name), `country_iso3` (SOM, #country+code), `year` (range 1960.0–2024.0).
**Outcome / Measurement** — `value` (range -6577585549.998–65735434041109.8).
**Identifier / Metadata** — `indicator_name` (Merchandise imports from high-income economies (% of total merchandise imports), Merchandise exports to low- and middle-income economies in Middle East & North Africa (% of total merchandise exports), Merchandise imports by the reporting economy, residual (% of total merchandise imports)), `indicator_code` (TM.VAL.MRCH.HI.ZS, TX.VAL.MRCH.R4.ZS, TM.VAL.MRCH.RS.ZS), `esa_source` (HDX), `esa_processed` (2026-04-07).
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-world-bank-trade-indicators-for-federal-republic-of-somalia")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `country_name` | object | 0.0% | Federal Republic of Somalia, #country+name |
| `country_iso3` | object | 0.0% | SOM, #country+code |
| `year` | float64 | 0.0% | 1960.0 – 2024.0 (mean 1994.1298) |
| `indicator_name` | object | 0.0% | Merchandise imports from high-income economies (% of total merchandise imports), Merchandise exports to low- and middle-income economies in Middle East & North Africa (% of total merchandise exports), Merchandise imports by the reporting economy, residual (% of total merchandise imports) |
| `indicator_code` | object | 0.0% | TM.VAL.MRCH.HI.ZS, TX.VAL.MRCH.R4.ZS, TM.VAL.MRCH.RS.ZS |
| `value` | float64 | 0.0% | -6577585549.998 – 65735434041109.8 (mean 208510043030.7533) |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-07 |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `year` | 1960.0 | 2024.0 | 1994.1298 | 1994.0 |
| `value` | -6577585549.998 | 65735434041109.8 | 208510043030.7533 | 15.2309 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 2 column(s) were cast from string to numeric or datetime based on parse-success rate (>85% threshold). The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from World Bank Group and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/world-bank-trade-indicators-for-federal-republic-of-somalia) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_world_bank_trade_indicators_for_federal_republic_of_somalia,
title = {Federal Republic of Somalia - Trade},
author = {World Bank Group},
year = {2025},
url = {https://data.humdata.org/dataset/world-bank-trade-indicators-for-federal-republic-of-somalia},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
### 数据集元数据
- 注释创建者:无注释
- 语言数据来源:现有公开资源
- 语言:英语
- 许可证:CC BY 4.0
- 多语言属性:单语言数据集
- 样本规模:1000 < 样本量 < 10000
- 源数据集类型:原始数据集
- 任务类别:表格分类、表格回归
- 任务子类别:无
- 标签:非洲、人道主义、人道主义数据交换(Humanitarian Data Exchange, HDX)、非洲电羊(Electric Sheep Africa)、HXL、指标、贸易、索马里(SOM)
- 展示名称:"索马里联邦共和国-贸易"
- 数据集信息:
- 数据划分:
- 训练集:1880条样本
- 测试集:470条样本
# 索马里联邦共和国-贸易
**发布方**:世界银行集团 · **来源**:[人道主义数据交换(Humanitarian Data Exchange, HDX)](https://data.humdata.org/dataset/world-bank-trade-indicators-for-federal-republic-of-somalia) · **许可证**:`CC BY` · **更新时间**:2025-11-04
---
## 摘要
本数据集包含来自世界银行[数据门户](http://data.worldbank.org/)的相关数据。同时,HDX平台上还发布有索马里联邦共和国的整合型国家指标数据集。
贸易是消除贫困、实现千年发展目标的核心路径之一,具体可通过提升发展中国家的市场准入条件,以及构建规则明确、可预期的贸易体系来实现。世界银行联合其他国际发展伙伴发起了“贸易透明度倡议”,旨在免费便捷地提供各国针对性贸易政策相关数据。
本数据集的每一行均代表国家层面的汇总统计数据。数据集最后一次在HDX平台更新的时间为2025-11-04。数据覆盖的地理范围:**SOM(索马里)**。
*本数据集已由[非洲电羊(Electric Sheep Africa)](https://huggingface.co/electricsheepafrica)整理为适配机器学习的Parquet格式。*
---
## 数据集特征
| 指标 | 详情 |
|---|---|
| **研究领域** | 贫困与经济脆弱性 |
| **观测单元** | 国家层面汇总数据 |
| **总样本行数** | 2351 |
| **列数** | 8列(2列为数值型,6列为分类型,0列为日期时间型) |
| **训练集规模** | 1880行 |
| **测试集规模** | 470行 |
| **地理覆盖范围** | SOM(索马里) |
| **发布方** | 世界银行集团 |
| **HDX平台最后更新时间** | 2025-11-04 |
---
## 变量说明
### 地理类变量
- `country_name`:索马里联邦共和国,格式标识为#country+name
- `country_iso3`:SOM(索马里ISO 3166-1 alpha-3代码),格式标识为#country+code
- `year`:年份,取值范围为1960.0至2024.0
### 结果/测量类变量
- `value`:指标数值,取值范围为-6577585549.998至65735434041109.8
### 标识符/元数据类变量
- `indicator_name`:指标名称,包含“来自高收入经济体的商品进口额(占商品进口总额的百分比)”“向中东及北非地区低收入和中等收入经济体的商品出口额(占商品出口总额的百分比)”“报告经济体的商品进口额(残差项,占商品进口总额的百分比)”等类别
- `indicator_code`:指标代码,对应为TM.VAL.MRCH.HI.ZS、TX.VAL.MRCH.R4.ZS、TM.VAL.MRCH.RS.ZS
- `esa_source`:数据来源,为HDX
- `esa_processed`:数据处理时间,为2026-04-07
---
## 快速上手
python
from datasets import load_dataset
# 加载完整数据集
ds = load_dataset("electricsheepafrica/africa-world-bank-trade-indicators-for-federal-republic-of-somalia")
# 将训练集转换为Pandas DataFrame格式
train = ds["train"].to_pandas()
# 将测试集转换为Pandas DataFrame格式
test = ds["test"].to_pandas()
# 打印训练集的形状
print(train.shape)
# 查看训练集前5行数据
train.head()
---
## 数据模式
| 列名 | 数据类型 | 空值占比 | 取值范围/示例值 |
|---|---|---|---|
| `country_name` | 对象型(object) | 0.0% | 索马里联邦共和国,格式标识为#country+name |
| `country_iso3` | 对象型(object) | 0.0% | SOM,格式标识为#country+code |
| `year` | 浮点型(float64) | 0.0% | 1960.0 – 2024.0(均值为1994.1298) |
| `indicator_name` | 对象型(object) | 0.0% | 包含“来自高收入经济体的商品进口额(占商品进口总额的百分比)”“向中东及北非地区低收入和中等收入经济体的商品出口额(占商品出口总额的百分比)”“报告经济体的商品进口额(残差项,占商品进口总额的百分比)”等指标名称 |
| `indicator_code` | 对象型(object) | 0.0% | 对应指标代码:TM.VAL.MRCH.HI.ZS、TX.VAL.MRCH.R4.ZS、TM.VAL.MRCH.RS.ZS |
| `value` | 浮点型(float64) | 0.0% | 取值范围为-6577585549.998至65735434041109.8(均值为208510043030.7533) |
| `esa_source` | 对象型(object) | 0.0% | 数据来源:HDX |
| `esa_processed` | 对象型(object) | 0.0% | 数据处理时间:2026-04-07 |
---
## 数值型变量统计摘要
| 列名 | 最小值 | 最大值 | 均值 | 中位数 |
|---|---|---|---|---|
| `year` | 1960.0 | 2024.0 | 1994.1298 | 1994.0 |
| `value` | -6577585549.998 | 65735434041109.8 | 208510043030.7533 | 15.2309 |
---
## 数据整理流程
原始数据通过CKAN API从HDX平台下载,并转换为Parquet格式。列名均转换为小写并标准化为蛇形命名法(snake_case)。常见的缺失值标记(`N/A`、`null`、`none`、`-`、`unknown`、`no data`、`#N/A`)均被统一替换为`NaN`。基于解析成功率(阈值为85%),将2列从字符串类型转换为数值型或日期时间型。本数据集以80:20的比例划分为训练集与测试集,划分时使用固定随机种子(42),并以Snappy压缩的Parquet格式存储。
---
## 数据局限性
1. 本数据集源自世界银行集团,尚未由非洲电羊(ESA)进行独立验证。
2. 自动化数据清洗流程无法修正原始数据集中的错报值、定义不一致问题或采样偏差。
3. 如需了解发布方提供的方法说明与注意事项,请参阅[HDX平台原始数据集页面](https://data.humdata.org/dataset/world-bank-trade-indicators-for-federal-republic-of-somalia)。
---
## 引用格式
bibtex
@dataset{hdx_africa_world_bank_trade_indicators_for_federal_republic_of_somalia,
title = {Federal Republic of Somalia - Trade},
author = {World Bank Group},
year = {2025},
url = {https://data.humdata.org/dataset/world-bank-trade-indicators-for-federal-republic-of-somalia},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
---
*[非洲电羊(Electric Sheep Africa)](https://huggingface.co/electricsheepafrica) — 非洲地区的机器学习数据集基础设施。尼日利亚拉各斯。*
提供机构:
electricsheepafrica



