five

electricsheepafrica/africa-world-bank-gender-indicators-for-eswatini

收藏
Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-world-bank-gender-indicators-for-eswatini
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - no-annotation language_creators: - found language: - en license: cc-by-4.0 multilinguality: - monolingual size_categories: - 1K<n<10K source_datasets: - original task_categories: - tabular-classification - tabular-regression task_ids: [] tags: - africa - humanitarian - hdx - electric-sheep-africa - gender - indicators - swz pretty_name: "Eswatini - Gender" dataset_info: splits: - name: train num_examples: 3822 - name: test num_examples: 955 --- # Eswatini - Gender **Publisher:** World Bank Group · **Source:** [HDX](https://data.humdata.org/dataset/world-bank-gender-indicators-for-eswatini) · **License:** `cc-by` · **Updated:** 2026-03-27 --- ## Abstract Contains data from the World Bank's [data portal](http://data.worldbank.org/). There is also a [consolidated country dataset](https://data.humdata.org/dataset/world-bank-combined-indicators-for-eswatini) on HDX. Gender equality is a core development objective in its own right. It is also smart development policy and sound business practice. It is integral to economic growth, business growth and good development outcomes. Gender equality can boost productivity, enhance prospects for the next generation, build resilience, and make institutions more representative and effective. In December 2015, the World Bank Group Board discussed our new Gender Equality Strategy 2016-2023, which aims to address persistent gaps and proposed a sharpened focus on more and better gender data. The Bank Group is continually scaling up commitments and expanding partnerships to fill significant gaps in gender data. The database hosts the latest sex-disaggregated data and gender statistics covering demography, education, health, access to economic opportunities, public life and decision-making, and agency. Each row in this dataset represents country-level aggregates. Data was last updated on HDX on 2026-03-27. Geographic scope: **SWZ**. *Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).* --- ## Dataset Characteristics | | | |---|---| | **Domain** | Public health | | **Unit of observation** | Country-level aggregates | | **Rows (total)** | 4,778 | | **Columns** | 8 (2 numeric, 6 categorical, 0 datetime) | | **Train split** | 3,822 rows | | **Test split** | 955 rows | | **Geographic scope** | SWZ | | **Publisher** | World Bank Group | | **HDX last updated** | 2026-03-27 | --- ## Variables **Geographic** — `country_name` (Eswatini), `country_iso3` (SWZ), `year` (range 1960.0–2025.0). **Outcome / Measurement** — `value` (range 0.0–210197.0). **Identifier / Metadata** — `indicator_name` (Age population, age 02, male, Age population, age 01, female, Age population, age 05, male), `indicator_code` (SP.POP.AG02.MA.IN, SP.POP.AG01.FE.IN, SP.POP.AG05.MA.IN), `esa_source` (HDX), `esa_processed` (2026-04-10). --- ## Quick Start ```python from datasets import load_dataset ds = load_dataset("electricsheepafrica/africa-world-bank-gender-indicators-for-eswatini") train = ds["train"].to_pandas() test = ds["test"].to_pandas() print(train.shape) train.head() ``` --- ## Schema | Column | Type | Null % | Range / Sample Values | |---|---|---|---| | `country_name` | object | 0.0% | Eswatini | | `country_iso3` | object | 0.0% | SWZ | | `year` | int64 | 0.0% | 1960.0 – 2025.0 (mean 1998.4971) | | `indicator_name` | object | 0.0% | Age population, age 02, male, Age population, age 01, female, Age population, age 05, male | | `indicator_code` | object | 0.0% | SP.POP.AG02.MA.IN, SP.POP.AG01.FE.IN, SP.POP.AG05.MA.IN | | `value` | float64 | 0.0% | 0.0 – 210197.0 (mean 3601.4806) | | `esa_source` | object | 0.0% | HDX | | `esa_processed` | object | 0.0% | 2026-04-10 | --- ## Numeric Summary | Column | Min | Max | Mean | Median | |---|---|---|---|---| | `year` | 1960.0 | 2025.0 | 1998.4971 | 2001.0 | | `value` | 0.0 | 210197.0 | 3601.4806 | 52.7115 | --- ## Curation Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet. --- ## Limitations - Data originates from World Bank Group and has not been independently validated by ESA. - Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection. - Refer to the [original HDX dataset page](https://data.humdata.org/dataset/world-bank-gender-indicators-for-eswatini) for the publisher's own methodology notes and caveats. --- ## Citation ```bibtex @dataset{hdx_africa_world_bank_gender_indicators_for_eswatini, title = {Eswatini - Gender}, author = {World Bank Group}, year = {2026}, url = {https://data.humdata.org/dataset/world-bank-gender-indicators-for-eswatini}, note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)} } ``` --- *[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica
搜集汇总
数据集介绍
main_image_url
构建方式
该数据集源自世界银行集团的性别平等指标,旨在追踪斯威士兰在人口、教育、健康、经济机会及公共事务参与等领域的性别差异数据。原始数据通过HDX平台的CKAN接口下载,经Electric Sheep Africa团队以机器学习就绪为目标进行精加工:列名转换为蛇形小写格式,统一缺失值标记为NaN,并按固定随机种子(42)将数据以80/20比例划分为训练集与测试集,最终存储为Snappy压缩的Parquet格式。
特点
数据集涵盖了1960年至2025年间斯威士兰的性别相关指标,共计4778个样本,包含8个字段,其中2个为数值型、6个为类别型,无空值缺失。其变量体系由地理标识(国家名、ISO代码、年份)、核心指标(指标名称与代码)、测量值及来源与处理时间戳构成,指标涉及不同年龄段与性别人口的细致划分。这种结构为时序分析与群体对比提供了丰富的维度。
使用方法
用户可通过HuggingFace的datasets库快速加载数据,例如使用load_dataset函数获取训练与测试子集,并转为Pandas DataFrame以开展分析。数据集适用于表格分类与回归任务,典型应用包括性别指标的趋势预测、政策效果评估或跨年份比较研究。建议结合原始HDX数据集说明以理解指标定义,并根据分析目标选择特定的indicator_name字段进行过滤或分组。
背景与挑战
背景概述
在全球发展议程中,性别平等被确立为核心目标之一,它不仅关乎社会正义,更是驱动经济增长、提升生产力及增强制度效能的战略性路径。世界银行集团于2015年底发布的《2016-2023年性别平等战略》明确指出了深化性别数据收集与分析的重要性,以填补长期存在的性别信息鸿沟。在此背景下,由世界银行集团于2026年首次发布、经由Electric Sheep Africa团队整理并开源的africa-world-bank-gender-indicators-for-eswatini数据集应运而生,聚焦斯威士兰(Eswatini)这一南部非洲国家的性别指标。该数据集囊括了人口、教育、健康、经济机会及公共生活参与等多维度的性别分列数据,旨在为算法模型提供结构化的国别层面聚合信息,进而支撑基于证据的性别政策研究与预测分析,对推动非洲地区性别平等研究的量化进程具有重要价值。
当前挑战
该数据集所应对的核心领域挑战在于性别平等研究中长期存在的数据荒漠现象,尤其是在发展中国家,缺乏系统、连续且细分至性别的统计指标,制约了对诸如教育获取、经济活动参与及健康福祉等维度进行精准量化评估与模型构建的能力。在构建过程中,研究人员面临的主要挑战包括:原始数据来源分散且格式不一,需通过CKAN API从HDX平台整合,并进行繁重的清洗与标准化工作,如统一列名、标识并转化多种缺失值标记(如N/A、null等);同时,数据源自世界银行,其收集本身可能存在汇报误差、定义不一致或抽样偏差,而自动化清洗流程无法完全修正这些固有的方法论瑕疵,需依赖原始发布者的技术说明进行风险控制,对数据质量的严谨性提出了极高要求。
常用场景
经典使用场景
在性别平等与可持续发展研究领域,africa-world-bank-gender-indicators-for-eswatini 数据集作为史瓦帝尼王国性别指标的权威来源,常被用于构建时序预测模型和分类任务。研究者利用其涵盖1960至2025年间的多维度指标(如分年龄、分性别人口结构),通过回归分析揭示性别差异随时间的演变规律。该数据集也广泛应用于教育、健康与经济参与等领域的性别差距量化评估,为政策制定者提供数据驱动的决策支持。
实际应用
在国际发展机构与非政府组织的实践中,该数据集被用于监测联合国可持续发展目标(SDG)在史瓦帝尼的落地进展。例如,通过分析分性别的入学率与就业率,援助机构可精准定位干预重点。公共卫生部门借助其人口结构数据优化妇幼健康资源配置,教育部门则利用青少年性别比例信息调整奖学金计划。私营企业亦通过解读劳动参与率指标,制定更具包容性的人力资源战略。
衍生相关工作
该数据集衍生出一系列基准测试与模型优化工作。研究人员基于其时间序列特性,开发了针对小样本发展中国家指标预测的贝叶斯结构时间序列模型。另有工作将其与相邻国家的世界银行数据对齐,构建跨国性别差距对比框架。在机器学习社区,该数据集成为评估表格数据预训练模型(如TabPFN)在跨国迁移学习效果的测试基准,催生了若干关于数据稀疏性处理与协变量偏移校正的改进算法。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务