electricsheepafrica/africa-zaf-climate-trace
收藏Hugging Face2026-04-04 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-zaf-climate-trace
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- 10K<n<100K
source_datasets:
- original
task_categories:
- tabular-classification
- tabular-regression
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- climate-weather
- environment
- points-of-interest-poi
- zaf
pretty_name: "South Africa: Greenhouse Gas and Air Pollutant Emissions"
dataset_info:
splits:
- name: train
num_examples: 8338
- name: test
num_examples: 2084
---
# South Africa: Greenhouse Gas and Air Pollutant Emissions
**Publisher:** Climate TRACE · **Source:** [HDX](https://data.humdata.org/dataset/zaf-climate-trace) · **License:** `cc-by` · **Updated:** 2026-03-30
---
## Abstract
Climate TRACE is a non-profit coalition of organizations building a timely, open, and accessible inventory of exactly where greenhouse gas emissions are coming from. Climate TRACE estimates greenhouse gas (GHG) and air pollutant emissions for over 2.7 million sources (from over 744 million assets), and every single country globally.
The Climate TRACE emissions inventory includes:
- Annual country-level emissions by sub-sector and by gas beginning in 2015
- Monthly source-level emissions by sub-sector and gas beginning in 2021 and confidence
- Emissions source ownership where and when available.
Each row in this dataset represents time-series observations. Data was last updated on HDX on 2026-03-30. Geographic scope: **ZAF**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Climate and environment |
| **Unit of observation** | Time-series observations |
| **Rows (total)** | 10,423 |
| **Columns** | 13 (4 numeric, 9 categorical, 0 datetime) |
| **Train split** | 8,338 rows |
| **Test split** | 2,084 rows |
| **Geographic scope** | ZAF |
| **Publisher** | Climate TRACE |
| **HDX last updated** | 2026-03-30 |
---
## Variables
**Geographic** — `year` (range 2024.0–2026.0), `emissionsquantity` (range 0.0–122675.3399).
**Temporal** — `month` (range 1.0–12.0).
**Identifier / Metadata** — `full_name` (South Africa, KwaZulu-Natal Province, ZAF, Western Cape Province, ZAF), `id` (ZAF, ZAF.4_1, ZAF.9_1), `level_0_id` (ZAF), `level_1_id` (ZAF.4_1, ZAF.9_1, ZAF.6_1), `name` (South Africa, KwaZulu-Natal Province, Western Cape Province) and 2 others.
**Other** — `level` (range 0.0–1.0), `sector` (agriculture, manufacturing, transportation), `gas` (ch4).
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-zaf-climate-trace")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `full_name` | object | 0.0% | South Africa, KwaZulu-Natal Province, ZAF, Western Cape Province, ZAF |
| `id` | object | 0.0% | ZAF, ZAF.4_1, ZAF.9_1 |
| `level` | int64 | 0.0% | 0.0 – 1.0 (mean 0.8919) |
| `level_0_id` | object | 0.0% | ZAF |
| `level_1_id` | object | 10.8% | ZAF.4_1, ZAF.9_1, ZAF.6_1 |
| `name` | object | 0.0% | South Africa, KwaZulu-Natal Province, Western Cape Province |
| `year` | int64 | 0.0% | 2024.0 – 2026.0 (mean 2024.6148) |
| `month` | int64 | 0.0% | 1.0 – 12.0 (mean 6.6659) |
| `sector` | object | 0.0% | agriculture, manufacturing, transportation |
| `gas` | object | 0.0% | ch4 |
| `emissionsquantity` | float64 | 0.0% | 0.0 – 122675.3399 (mean 2316.6402) |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-04 |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `level` | 0.0 | 1.0 | 0.8919 | 1.0 |
| `year` | 2024.0 | 2026.0 | 2024.6148 | 2025.0 |
| `month` | 1.0 | 12.0 | 6.6659 | 7.0 |
| `emissionsquantity` | 0.0 | 122675.3399 | 2316.6402 | 11.1558 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 1 column(s) with >80% missing values were removed: `level_2_id`. 6,597 exact duplicate rows were removed. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from Climate TRACE and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/zaf-climate-trace) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_zaf_climate_trace,
title = {South Africa: Greenhouse Gas and Air Pollutant Emissions},
author = {Climate TRACE},
year = {2026},
url = {https://data.humdata.org/dataset/zaf-climate-trace},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
---
注释创建者:
- 无注释
语言采集方式:
- 现有数据获取
语言:
- 英语(en)
许可协议:
- CC-BY 4.0
多语言属性:
- 单语言
样本规模类别:
- 10000 < 样本量 < 100000
源数据集:
- 原始数据集
任务类别:
- 表格分类
- 表格回归
任务子类别:
- 无
标签:
- 非洲
- 人道主义
- HDX
- Electric Sheep Africa
- 气候-气象
- 环境
- 兴趣点(POI)
- ZAF
美观名称:"南非:温室气体与空气污染物排放"
数据集信息:
数据划分:
- 名称:训练集
样本数:8338
- 名称:测试集
样本数:2084
---
# 南非:温室气体与空气污染物排放
**发布方:** Climate TRACE · **来源:** [HDX(人道主义数据交换平台)](https://data.humdata.org/dataset/zaf-climate-trace) · **许可协议:** `CC-BY` · **最后更新:** 2026-03-30
---
## 摘要
Climate TRACE(气候追踪联盟)是一个非营利性组织联盟,致力于构建及时、开放且可获取的温室气体排放来源精准清单。Climate TRACE会对全球所有国家的超过7.44亿个资产对应的270余万个排放源进行温室气体(GHG)与空气污染物排放估算。
Climate TRACE的排放清单包含:
- 2015年起按子行业与气体类型划分的国家级年度排放数据
- 2021年起按子行业与气体类型划分的源级月度排放数据及置信度信息
- 可用情况下的排放源所有权信息。
本数据集的每一行均代表时序观测数据。数据最近一次于HDX平台更新的时间为2026-03-30。地理覆盖范围为ZAF(南非)。本数据集已由Electric Sheep Africa整理为适合机器学习使用的Parquet格式。
---
## 数据集特征
| | |
|---|---|
| **领域** | 气候与环境 |
| **观测单元** | 时序观测数据 |
| **总样本数** | 10423条 |
| **列数** | 13列(4个数值型,9个分类型,0个日期时间型) |
| **训练集样本数** | 8338条 |
| **测试集样本数** | 2084条 |
| **地理覆盖范围** | ZAF(南非) |
| **发布方** | Climate TRACE |
| **HDX平台最后更新时间** | 2026-03-30 |
---
## 变量说明
**地理类变量**:`year`(取值范围2024.0–2026.0)、`emissionsquantity`(取值范围0.0–122675.3399)。
**时间类变量**:`month`(取值范围1.0–12.0)。
**标识符与元数据变量**:`full_name`(取值示例:南非、夸祖鲁-纳塔尔省、ZAF、西开普省、ZAF)、`id`(取值示例:ZAF、ZAF.4_1、ZAF.9_1)、`level_0_id`(取值示例:ZAF)、`level_1_id`(取值示例:ZAF.4_1、ZAF.9_1、ZAF.6_1)、`name`(取值示例:南非、夸祖鲁-纳塔尔省、西开普省)及另外2个变量。
**其他变量**:`level`(取值范围0.0–1.0)、`sector`(取值:农业、制造业、交通运输业)、`gas`(取值:ch4,即甲烷)。
---
## 快速上手
python
from datasets import load_dataset
# 加载数据集
ds = load_dataset("electricsheepafrica/africa-zaf-climate-trace")
# 将训练集与测试集转换为Pandas DataFrame格式
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
# 打印训练集维度
print(train.shape)
# 查看训练集前5条数据
train.head()
---
## 数据Schema
| 列名 | 数据类型 | 空值占比 | 取值范围/示例值 |
|---|---|---|---|
| `full_name` | 字符型(object) | 0.0% | 南非、夸祖鲁-纳塔尔省、ZAF、西开普省、ZAF |
| `id` | 字符型 | 0.0% | ZAF、ZAF.4_1、ZAF.9_1 |
| `level` | 整型(int64) | 0.0% | 0.0 – 1.0(均值0.8919) |
| `level_0_id` | 字符型 | 0.0% | ZAF |
| `level_1_id` | 字符型 | 10.8% | ZAF.4_1、ZAF.9_1、ZAF.6_1 |
| `name` | 字符型 | 0.0% | 南非、夸祖鲁-纳塔尔省、西开普省 |
| `year` | 整型 | 0.0% | 2024.0 – 2026.0(均值2024.6148) |
| `month` | 整型 | 0.0% | 1.0 – 12.0(均值6.6659) |
| `sector` | 字符型 | 0.0% | 农业、制造业、交通运输业 |
| `gas` | 字符型 | 0.0% | ch4(甲烷) |
| `emissionsquantity` | 浮点型(float64) | 0.0% | 0.0 – 122675.3399(均值2316.6402) |
| `esa_source` | 字符型 | 0.0% | HDX |
| `esa_processed` | 字符型 | 0.0% | 2026-04-04 |
---
## 数值型变量统计摘要
| 列名 | 最小值 | 最大值 | 均值 | 中位数 |
|---|---|---|---|---|
| `level` | 0.0 | 1.0 | 0.8919 | 1.0 |
| `year` | 2024.0 | 2026.0 | 2024.6148 | 2025.0 |
| `month` | 1.0 | 12.0 | 6.6659 | 7.0 |
| `emissionsquantity` | 0.0 | 122675.3399 | 2316.6402 | 11.1558 |
---
## 数据整理流程
原始数据通过CKAN API从HDX平台下载,并转换为Parquet格式。列名统一转换为小写并标准化为蛇形命名法(snake_case)。将常见的缺失值标记(`N/A`、`null`、`none`、`-`、`unknown`、`no data`、`#N/A`)统一替换为`NaN`。删除了1列缺失值占比超过80%的列:`level_2_id`。删除了6597条完全重复的样本。使用固定随机种子(42)将数据集按80/20的比例划分为训练集与测试集,并保存为Snappy压缩格式的Parquet文件。
---
## 数据集局限性
- 数据源自Climate TRACE,尚未由Electric Sheep Africa进行独立验证。
- 自动化清洗流程无法修正原始数据收集阶段的错报值、定义不一致或采样偏差问题。
- 请参阅[原始HDX数据集页面](https://data.humdata.org/dataset/zaf-climate-trace)查看发布方提供的方法说明与注意事项。
---
## 引用格式
bibtex
@dataset{hdx_africa_zaf_climate_trace,
title = {南非:温室气体与空气污染物排放},
author = {Climate TRACE},
year = {2026},
url = {https://data.humdata.org/dataset/zaf-climate-trace},
note = {由Electric Sheep Africa重新打包以适配机器学习场景(https://huggingface.co/electricsheepafrica)}
}
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — 非洲机器学习数据集基础设施提供商。尼日利亚拉各斯。*
提供机构:
electricsheepafrica



