five

electricsheepafrica/africa-world-bank-environment-indicators-for-ghana

收藏
Hugging Face2026-04-11 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-world-bank-environment-indicators-for-ghana
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - no-annotation language_creators: - found language: - en license: cc-by-4.0 multilinguality: - monolingual size_categories: - 1K<n<10K source_datasets: - original task_categories: - tabular-classification - tabular-regression task_ids: [] tags: - africa - humanitarian - hdx - electric-sheep-africa - environment - indicators - gha pretty_name: "Ghana - Environment" dataset_info: splits: - name: train num_examples: 4102 - name: test num_examples: 1025 --- # Ghana - Environment **Publisher:** World Bank Group · **Source:** [HDX](https://data.humdata.org/dataset/world-bank-environment-indicators-for-ghana) · **License:** `cc-by` · **Updated:** 2026-03-27 --- ## Abstract Contains data from the World Bank's [data portal](http://data.worldbank.org/). There is also a [consolidated country dataset](https://data.humdata.org/dataset/world-bank-combined-indicators-for-ghana) on HDX. Natural and man-made environmental resources – fresh water, clean air, forests, grasslands, marine resources, and agro-ecosystems – provide sustenance and a foundation for social and economic development. The need to safeguard these resources crosses all borders. Today, the World Bank is one of the key promoters and financiers of environmental upgrading in the developing world. Data here cover forests, biodiversity, emissions, and pollution. Other indicators relevant to the environment are found under data pages for Agriculture & Rural Development, Energy & Mining, Infrastructure, and Urban Development. Each row in this dataset represents country-level aggregates. Data was last updated on HDX on 2026-03-27. Geographic scope: **GHA**. *Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).* --- ## Dataset Characteristics | | | |---|---| | **Domain** | Water, sanitation and hygiene (wash) | | **Unit of observation** | Country-level aggregates | | **Rows (total)** | 5,128 | | **Columns** | 8 (2 numeric, 6 categorical, 0 datetime) | | **Train split** | 4,102 rows | | **Test split** | 1,025 rows | | **Geographic scope** | GHA | | **Publisher** | World Bank Group | | **HDX last updated** | 2026-03-27 | --- ## Variables **Geographic** — `country_name` (Ghana), `country_iso3` (GHA), `year` (range 1960.0–2024.0). **Outcome / Measurement** — `value` (range -4211400229.4574–13233383822.2625). **Identifier / Metadata** — `indicator_name` (Total fisheries production (metric tons), Capture fisheries production (metric tons), Aquaculture production (metric tons)), `indicator_code` (ER.FSH.PROD.MT, ER.FSH.CAPT.MT, ER.FSH.AQUA.MT), `esa_source` (HDX), `esa_processed` (2026-04-11). --- ## Quick Start ```python from datasets import load_dataset ds = load_dataset("electricsheepafrica/africa-world-bank-environment-indicators-for-ghana") train = ds["train"].to_pandas() test = ds["test"].to_pandas() print(train.shape) train.head() ``` --- ## Schema | Column | Type | Null % | Range / Sample Values | |---|---|---|---| | `country_name` | object | 0.0% | Ghana | | `country_iso3` | object | 0.0% | GHA | | `year` | int64 | 0.0% | 1960.0 – 2024.0 (mean 1999.9019) | | `indicator_name` | object | 0.0% | Total fisheries production (metric tons), Capture fisheries production (metric tons), Aquaculture production (metric tons) | | `indicator_code` | object | 0.0% | ER.FSH.PROD.MT, ER.FSH.CAPT.MT, ER.FSH.AQUA.MT | | `value` | float64 | 0.0% | -4211400229.4574 – 13233383822.2625 (mean 64666853.4079) | | `esa_source` | object | 0.0% | HDX | | `esa_processed` | object | 0.0% | 2026-04-11 | --- ## Numeric Summary | Column | Min | Max | Mean | Median | |---|---|---|---|---| | `year` | 1960.0 | 2024.0 | 1999.9019 | 2002.0 | | `value` | -4211400229.4574 | 13233383822.2625 | 64666853.4079 | 5.8853 | --- ## Curation Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet. --- ## Limitations - Data originates from World Bank Group and has not been independently validated by ESA. - Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection. - Refer to the [original HDX dataset page](https://data.humdata.org/dataset/world-bank-environment-indicators-for-ghana) for the publisher's own methodology notes and caveats. --- ## Citation ```bibtex @dataset{hdx_africa_world_bank_environment_indicators_for_ghana, title = {Ghana - Environment}, author = {World Bank Group}, year = {2026}, url = {https://data.humdata.org/dataset/world-bank-environment-indicators-for-ghana}, note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)} } ``` --- *[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica
搜集汇总
数据集介绍
main_image_url
构建方式
在环境科学领域,数据集的构建往往依赖于权威机构的长期监测与统计。本数据集源于世界银行集团的环境指标门户,通过人道主义数据交换平台获取原始资料,并由Electric Sheep Africa团队进行系统化整理。原始数据经由CKAN API下载后,经历了列名标准化与缺失值统一处理,将常见的空值标记转换为NaN格式。为确保机器学习任务的适用性,数据被转换为Parquet格式,并采用固定随机种子按80:20的比例划分为训练集与测试集,最终形成包含5,128条国家层面聚合记录的结构化表格。
特点
该数据集聚焦于加纳的环境指标,涵盖渔业生产总量、捕捞产量与水产养殖产量等关键维度,时间跨度自1960年至2024年,提供了长达六十余年的连续观测序列。其数据结构清晰,包含8个字段,其中数值型与分类型变量分布均衡,且无缺失值,确保了数据的完整性与一致性。作为国家层面的聚合数据,每条记录均附有标准化的指标名称与代码,便于跨数据库比对与整合。数据集经过Snappy压缩优化,在保持信息密度的同时提升了存储与读取效率。
使用方法
在环境政策分析与可持续发展研究中,该数据集可作为评估加纳渔业资源演变趋势的基础资料。使用者可通过Hugging Face的datasets库直接加载数据,利用Python生态中的Pandas等工具进行探索性分析。数据集已预分割为训练集与测试集,适用于时间序列预测、回归建模或分类任务,例如基于历史数据预测未来渔业产量。分析时需注意数据源自世界银行的官方统计,虽经清洗但未经验证,建议结合原始方法论说明进行解读,以确保结论的稳健性。
背景与挑战
背景概述
环境指标数据集作为量化评估生态资源与人类活动交互影响的关键工具,其构建与应用在全球可持续发展议程中占据核心地位。由世界银行集团发布的加纳环境指标数据集,聚焦于渔业生产等自然资源维度,通过系统整合1960年至2024年的国家级时序数据,为揭示区域环境演变规律提供了结构化观测基础。该数据集经由Electric Sheep Africa团队进行机器学习适配化处理,以标准化格式支持环境政策分析与预测建模,体现了国际机构在推动数据驱动型环境治理中的桥梁作用。
当前挑战
该数据集致力于应对环境科学领域长期存在的挑战,即如何精准量化自然资源动态并建立可靠预测模型。具体而言,渔业生产指标涉及捕捞与养殖的复杂相互作用,其数值波动受气候、政策及市场多重因素驱动,构建统一评估框架面临固有难度。在数据构建层面,原始采集过程中可能存在报告偏差、定义不一致或缺失值问题,而自动化清洗流程难以完全校正深层方法论差异。此外,跨年份数据的一致性校验与异常值解释,仍需依赖领域专业知识进行补充验证,这对机器学习模型的鲁棒性与可解释性提出了更高要求。
常用场景
经典使用场景
在环境科学与发展经济学交叉领域,该数据集为研究加纳渔业资源动态提供了结构化时序数据。学者们常利用其1960年至2024年的年度观测值,构建时间序列模型以分析总渔业产量、捕捞产量与水产养殖产量的演变轨迹。这类分析能够揭示资源利用模式与政策干预之间的关联,为可持续资源管理奠定实证基础。
实际应用
在实际政策制定层面,该数据集被加纳政府机构与国际组织用于监测可持续发展目标(SDGs)中海洋资源指标的进展。渔业管理部门可依据产量趋势调整捕捞配额,而援助机构则能据此设计有针对性的水产养殖扶持项目。数据中异常值(如负产量记录)亦能提示统计体系需改进的环节。
衍生相关工作
围绕该数据集衍生的经典研究包括基于机器学习的渔业产量预测模型,以及将环境指标与宏观经济变量耦合的计量经济学分析。例如,学者常将其与气候数据集结合,探究厄尔尼诺现象对西非渔业的影响;亦有工作利用面板数据方法,比较加纳与邻国渔业管理政策的成效。这些研究显著丰富了环境经济学的实证文献体系。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务