electricsheepafrica/africa-world-bank-environment-indicators-for-gambia-the
收藏Hugging Face2026-04-11 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-world-bank-environment-indicators-for-gambia-the
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- tabular-classification
- tabular-regression
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- environment
- indicators
- gmb
pretty_name: "Gambia, The - Environment"
dataset_info:
splits:
- name: train
num_examples: 3892
- name: test
num_examples: 973
---
# Gambia, The - Environment
**Publisher:** World Bank Group · **Source:** [HDX](https://data.humdata.org/dataset/world-bank-environment-indicators-for-gambia-the) · **License:** `cc-by` · **Updated:** 2026-03-27
---
## Abstract
Contains data from the World Bank's [data portal](http://data.worldbank.org/). There is also a [consolidated country dataset](https://data.humdata.org/dataset/world-bank-combined-indicators-for-gambia-the) on HDX.
Natural and man-made environmental resources – fresh water, clean air, forests, grasslands, marine resources, and agro-ecosystems – provide sustenance and a foundation for social and economic development. The need to safeguard these resources crosses all borders. Today, the World Bank is one of the key promoters and financiers of environmental upgrading in the developing world. Data here cover forests, biodiversity, emissions, and pollution. Other indicators relevant to the environment are found under data pages for Agriculture & Rural Development, Energy & Mining, Infrastructure, and Urban Development.
Each row in this dataset represents country-level aggregates. Data was last updated on HDX on 2026-03-27. Geographic scope: **GMB**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Water, sanitation and hygiene (wash) |
| **Unit of observation** | Country-level aggregates |
| **Rows (total)** | 4,865 |
| **Columns** | 8 (2 numeric, 6 categorical, 0 datetime) |
| **Train split** | 3,892 rows |
| **Test split** | 973 rows |
| **Geographic scope** | GMB |
| **Publisher** | World Bank Group |
| **HDX last updated** | 2026-03-27 |
---
## Variables
**Geographic** — `country_name` (Gambia, The), `country_iso3` (GMB), `year` (range 1960.0–2024.0).
**Outcome / Measurement** — `value` (range -301328261.2578–406380057.9243).
**Identifier / Metadata** — `indicator_name` (Total fisheries production (metric tons), Capture fisheries production (metric tons), Aquaculture production (metric tons)), `indicator_code` (ER.FSH.PROD.MT, ER.FSH.CAPT.MT, ER.FSH.AQUA.MT), `esa_source` (HDX), `esa_processed` (2026-04-11).
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-world-bank-environment-indicators-for-gambia-the")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `country_name` | object | 0.0% | Gambia, The |
| `country_iso3` | object | 0.0% | GMB |
| `year` | int64 | 0.0% | 1960.0 – 2024.0 (mean 2000.5698) |
| `indicator_name` | object | 0.0% | Total fisheries production (metric tons), Capture fisheries production (metric tons), Aquaculture production (metric tons) |
| `indicator_code` | object | 0.0% | ER.FSH.PROD.MT, ER.FSH.CAPT.MT, ER.FSH.AQUA.MT |
| `value` | float64 | 0.0% | -301328261.2578 – 406380057.9243 (mean 840073.1386) |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-11 |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `year` | 1960.0 | 2024.0 | 2000.5698 | 2003.0 |
| `value` | -301328261.2578 | 406380057.9243 | 840073.1386 | 1.34 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from World Bank Group and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/world-bank-environment-indicators-for-gambia-the) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_world_bank_environment_indicators_for_gambia_the,
title = {Gambia, The - Environment},
author = {World Bank Group},
year = {2026},
url = {https://data.humdata.org/dataset/world-bank-environment-indicators-for-gambia-the},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica
搜集汇总
数据集介绍

构建方式
该数据集源自世界银行集团的环境指标数据门户,由Electric Sheep Africa团队通过HDX平台的CKAN API获取原始资料,并经过系统化处理转化为机器学习可用的格式。原始数据经过标准化清洗,包括统一列名为蛇形命名法、将各类缺失值标记统一为NaN值,确保数据结构的规范性与一致性。随后,采用固定随机种子将数据集按80:20的比例划分为训练集与测试集,最终以Snappy压缩的Parquet格式存储,为后续分析提供了可靠的基础。
特点
本数据集聚焦于冈比亚的环境领域,涵盖渔业生产总量、捕捞产量及水产养殖产量等关键指标,时间跨度自1960年至2024年,呈现了该国环境资源的长期演变轨迹。数据集以国家层面的聚合数据为观测单位,包含8个变量,其中数值型与分类型变量分布均衡,无缺失值,确保了数据的完整性与可用性。其地理范围明确限定于冈比亚(ISO3代码GMB),为区域环境政策评估与可持续发展研究提供了高度专一的数据支持。
使用方法
使用者可通过Hugging Face的datasets库直接加载该数据集,利用Python环境快速导入训练集与测试集,并转换为Pandas DataFrame以进行后续探索与分析。数据集适用于表格分类与回归任务,能够支持环境指标的趋势预测、影响因素建模等机器学习应用。在具体研究中,建议结合世界银行的原方法说明,审慎考量数据采集的潜在偏差与定义一致性,以确保分析结论的稳健性与可靠性。
背景与挑战
背景概述
环境指标数据集在可持续发展研究中扮演着关键角色,为政策制定和资源管理提供量化依据。由世界银行集团创建并于2026年发布,该数据集聚焦于冈比亚的环境状况,涵盖渔业生产、生物多样性及排放污染等核心领域。Electric Sheep Africa机构将其重新整合为机器学习可用格式,旨在通过国家层面的聚合数据,揭示环境资源与社会经济发展之间的内在联系。这类数据集不仅推动了非洲地区环境科学的实证研究,也为跨国比较和政策评估建立了标准化基准。
当前挑战
该数据集致力于解决环境资源量化评估的复杂性挑战,特别是在渔业生产等动态系统中,准确捕捉长期趋势并分离自然与人为因素影响存在显著困难。构建过程中,原始数据存在数值异常与缺失值统一化问题,例如负值记录与广泛的数据范围增加了清洗与标准化难度。此外,指标定义随时间演变可能引发一致性偏差,而自动化处理无法修正原始收集阶段的报告错误或抽样偏差,这要求使用者必须结合世界银行的方法论说明进行谨慎解读。
常用场景
经典使用场景
在环境科学与可持续发展研究领域,该数据集作为冈比亚渔业生产与环境指标的权威时序数据源,常被用于构建回归模型以预测渔业产量动态。研究者通过整合年份、指标代码与数值变量,分析捕捞渔业与水产养殖的长期趋势,揭示自然资源管理中的关键驱动因素。此类应用不仅支撑了环境政策的量化评估,还为机器学习模型在生态时序预测中的性能验证提供了基准。
解决学术问题
该数据集有效解决了环境经济学中资源可持续性度量的数据稀缺问题,为学者提供了连续六十余年的标准化渔业生产记录。通过消除原始数据中的缺失值与格式不一致性,它支持了跨年度比较与因果推断研究,助力于解析人类活动对海洋生态系统的影响机制。其结构化设计显著降低了环境指标分析的技术门槛,推动了数据驱动型政策研究在发展中国家的发展。
衍生相关工作
围绕该数据集衍生的经典工作包括基于时序预测的渔业资源管理模型,以及融合多国环境指标的比较研究。学者们常将其与气候或经济数据集耦合,探究厄尔尼诺现象对西非渔业的影响。同时,该数据也催生了针对小样本国家环境数据的机器学习方法创新,例如在有限观测下构建稳健的回归模型,这些成果显著丰富了发展经济学与环境信息学的研究图谱。
以上内容由遇见数据集搜集并总结生成



