electricsheepafrica/africa-world-bank-gender-indicators-for-federal-republic-of-somalia
收藏Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-world-bank-gender-indicators-for-federal-republic-of-somalia
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- tabular-classification
- tabular-regression
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- gender
- hxl
- indicators
- som
pretty_name: "Federal Republic of Somalia - Gender"
dataset_info:
splits:
- name: train
num_examples: 4250
- name: test
num_examples: 1062
---
# Federal Republic of Somalia - Gender
**Publisher:** World Bank Group · **Source:** [HDX](https://data.humdata.org/dataset/world-bank-gender-indicators-for-federal-republic-of-somalia) · **License:** `cc-by` · **Updated:** 2025-11-04
---
## Abstract
Contains data from the World Bank's [data portal](http://data.worldbank.org/). There is also a [consolidated country dataset](https://data.humdata.org/dataset/world-bank-combined-indicators-for-federal-republic-of-somalia) on HDX.
Gender equality is a core development objective in its own right. It is also smart development policy and sound business practice. It is integral to economic growth, business growth and good development outcomes. Gender equality can boost productivity, enhance prospects for the next generation, build resilience, and make institutions more representative and effective. In December 2015, the World Bank Group Board discussed our new Gender Equality Strategy 2016-2023, which aims to address persistent gaps and proposed a sharpened focus on more and better gender data. The Bank Group is continually scaling up commitments and expanding partnerships to fill significant gaps in gender data. The database hosts the latest sex-disaggregated data and gender statistics covering demography, education, health, access to economic opportunities, public life and decision-making, and agency.
Each row in this dataset represents country-level aggregates. Data was last updated on HDX on 2025-11-04. Geographic scope: **SOM**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Public health |
| **Unit of observation** | Country-level aggregates |
| **Rows (total)** | 5,313 |
| **Columns** | 8 (2 numeric, 6 categorical, 0 datetime) |
| **Train split** | 4,250 rows |
| **Test split** | 1,062 rows |
| **Geographic scope** | SOM |
| **Publisher** | World Bank Group |
| **HDX last updated** | 2025-11-04 |
---
## Variables
**Geographic** — `country_name` (Federal Republic of Somalia, #country+name), `country_iso3` (SOM, #country+code), `year` (range 1960.0–2024.0).
**Outcome / Measurement** — `value` (range 0.0–1072683.0).
**Identifier / Metadata** — `indicator_name` (Age population, age 02, male, Age population, age 00, female, Age population, age 05, male), `indicator_code` (SP.POP.AG02.MA.IN, SP.POP.AG00.FE.IN, SP.POP.AG05.MA.IN), `esa_source` (HDX), `esa_processed` (2026-04-07).
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-world-bank-gender-indicators-for-federal-republic-of-somalia")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `country_name` | object | 0.0% | Federal Republic of Somalia, #country+name |
| `country_iso3` | object | 0.0% | SOM, #country+code |
| `year` | float64 | 0.0% | 1960.0 – 2024.0 (mean 1996.8556) |
| `indicator_name` | object | 0.0% | Age population, age 02, male, Age population, age 00, female, Age population, age 05, male |
| `indicator_code` | object | 0.0% | SP.POP.AG02.MA.IN, SP.POP.AG00.FE.IN, SP.POP.AG05.MA.IN |
| `value` | float64 | 0.0% | 0.0 – 1072683.0 (mean 27929.1292) |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-07 |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `year` | 1960.0 | 2024.0 | 1996.8556 | 1998.0 |
| `value` | 0.0 | 1072683.0 | 27929.1292 | 12.8109 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 2 column(s) were cast from string to numeric or datetime based on parse-success rate (>85% threshold). The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from World Bank Group and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/world-bank-gender-indicators-for-federal-republic-of-somalia) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_world_bank_gender_indicators_for_federal_republic_of_somalia,
title = {Federal Republic of Somalia - Gender},
author = {World Bank Group},
year = {2025},
url = {https://data.humdata.org/dataset/world-bank-gender-indicators-for-federal-republic-of-somalia},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
---
注释创建者:
- 无注释
语言创建者:
- 外部获取
语言:
- 英语
许可证:CC-BY-4.0
多语言属性:
- 单语言
数据规模类别:
- 1000 < 样本数 < 10000
源数据集:
- 原创数据集
任务类别:
- 表格分类
- 表格回归
任务子类别:
- 无
标签:
- 非洲
- 人道主义
- 人道主义数据交换(HDX)
- 非洲电子绵羊(electric-sheep-africa)
- 性别
- HXL
- 指标
- SOM
美观名称:"索马里联邦共和国——性别指标"
数据集信息:
数据集划分:
- 名称:训练集
样本数量:4250
- 名称:测试集
样本数量:1062
---
# 索马里联邦共和国——性别指标数据集
**发布方**:世界银行集团(World Bank Group) · **来源**:[人道主义数据交换(HDX)](https://data.humdata.org/dataset/world-bank-gender-indicators-for-federal-republic-of-somalia) · **许可证**:`CC-BY` · **更新时间**:2025-11-04
---
## 摘要
本数据集数据源自世界银行集团[数据门户](http://data.worldbank.org/),同时人道主义数据交换(HDX)平台上还发布有[索马里联邦共和国综合国家数据集](https://data.humdata.org/dataset/world-bank-combined-indicators-for-federal-republic-of-somalia)。
性别平等本身就是核心发展目标,同时也是明智的发展政策与稳健的商业实践。它与经济增长、商业发展及良好发展成果密不可分。性别平等能够提升生产力、改善下一代的发展前景、增强韧性,并让治理机构更具代表性与实效性。2015年12月,世界银行集团董事会审议通过了《2016-2023年性别平等战略》,该战略旨在解决长期存在的性别数据缺口,并提出进一步聚焦于获取更多、更优质的性别数据。世界银行集团正持续加大投入、拓展合作伙伴关系,以填补性别数据领域的显著缺口。本数据库收录了最新的分性别统计数据与性别统计指标,涵盖人口统计、教育、健康、经济机会获取、公共生活与决策参与以及个人赋权等领域。
本数据集每一行均代表国家级汇总数据。数据最近一次在HDX平台更新的时间为2025年11月4日,地理覆盖范围:**SOM(索马里ISO3代码)**。
*本数据集由[非洲电子绵羊(Electric Sheep Africa)](https://huggingface.co/electricsheepafrica)整理为适配机器学习的Parquet格式。*
---
## 数据集特征
| | |
|---|---|
| **领域** | 公共卫生 |
| **观测单元** | 国家级汇总数据 |
| **总行数** | 5,313 |
| **列数** | 8列(2列为数值型,6列为分类型,0列为日期时间型) |
| **训练集划分** | 4,250行 |
| **测试集划分** | 1,062行 |
| **地理覆盖范围** | SOM |
| **发布方** | 世界银行集团 |
| **HDX平台最后更新时间** | 2025-11-04 |
---
## 变量说明
**地理类变量**:`country_name`(国家名称:索马里联邦共和国,#country+name)、`country_iso3`(国家ISO3代码:SOM,#country+code)、`year`(年份:取值范围1960.0至2024.0)。
**结果/测量类变量**:`value`(指标数值:取值范围0.0至1072683.0)。
**标识符/元数据类变量**:`indicator_name`(指标名称:如“2岁男性人口数”“0岁女性人口数”“5岁男性人口数”等)、`indicator_code`(指标代码:SP.POP.AG02.MA.IN、SP.POP.AG00.FE.IN、SP.POP.AG05.MA.IN等)、`esa_source`(数据来源:HDX)、`esa_processed`(数据处理时间:2026-04-07)。
---
## 快速入门
python
from datasets import load_dataset
# 加载目标数据集
ds = load_dataset("electricsheepafrica/africa-world-bank-gender-indicators-for-federal-republic-of-somalia")
# 将训练集与测试集转换为Pandas DataFrame格式
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
# 打印训练集的形状
print(train.shape)
# 查看训练集前5条数据
train.head()
---
## 数据模式
| 列名 | 数据类型 | 空值占比 | 取值范围/示例值 |
|---|---|---|---|
| `country_name` | 字符串(object) | 0.0% | 索马里联邦共和国,#country+name |
| `country_iso3` | 字符串(object) | 0.0% | SOM,#country+code |
| `year` | 浮点型(float64) | 0.0% | 1960.0 – 2024.0(均值1996.8556) |
| `indicator_name` | 字符串(object) | 0.0% | 2岁男性人口数、0岁女性人口数、5岁男性人口数等 |
| `indicator_code` | 字符串(object) | 0.0% | SP.POP.AG02.MA.IN、SP.POP.AG00.FE.IN、SP.POP.AG05.MA.IN等 |
| `value` | 浮点型(float64) | 0.0% | 0.0 – 1072683.0(均值27929.1292) |
| `esa_source` | 字符串(object) | 0.0% | HDX |
| `esa_processed` | 字符串(object) | 0.0% | 2026-04-07 |
---
## 数值型变量统计摘要
| 列名 | 最小值 | 最大值 | 均值 | 中位数 |
|---|---|---|---|---|
| `year` | 1960.0 | 2024.0 | 1996.8556 | 1998.0 |
| `value` | 0.0 | 1072683.0 | 27929.1292 | 12.8109 |
---
## 数据整理流程
原始数据通过CKAN API从HDX平台下载,并转换为Parquet格式。所有列名均转为小写并标准化为蛇形命名法(snake_case)。常见的缺失值标记(`N/A`、`null`、`none`、`-`、`unknown`、`no data`、`#N/A`)被统一替换为`NaN`。基于解析成功率(阈值>85%),将2列从字符串类型转换为数值型或日期时间型。本数据集以固定随机种子(42)按照80/20的比例划分为训练集与测试集,并以Snappy压缩的Parquet格式存储。
---
## 局限性说明
- 本数据集数据源自世界银行集团,尚未由非洲电子绵羊(ESA)进行独立验证。
- 自动化数据清洗无法修正原始数据收集中的错误报告值、定义不一致或抽样偏差问题。
- 如需了解发布方的方法说明与注意事项,请参阅[HDX平台原始数据集页面](https://data.humdata.org/dataset/world-bank-gender-indicators-for-federal-republic-of-somalia)。
---
## 引用格式
bibtex
@dataset{hdx_africa_world_bank_gender_indicators_for_federal_republic_of_somalia,
title = {索马里联邦共和国——性别指标数据集},
author = {世界银行集团},
year = {2025},
url = {https://data.humdata.org/dataset/world-bank-gender-indicators-for-federal-republic-of-somalia},
note = {由非洲电子绵羊(Electric Sheep Africa)重新打包为机器学习适配格式(https://huggingface.co/electricsheepafrica)}
}
---
*[非洲电子绵羊(Electric Sheep Africa)](https://huggingface.co/electricsheepafrica)——非洲机器学习数据集基础设施,位于尼日利亚拉各斯。*
提供机构:
electricsheepafrica



