electricsheepafrica/africa-stp-climate-trace
收藏Hugging Face2026-04-04 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-stp-climate-trace
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- tabular-classification
- tabular-regression
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- climate-weather
- environment
- points-of-interest-poi
- stp
pretty_name: "Sao Tome and Principe: Greenhouse Gas and Air Pollutant Emissions"
dataset_info:
splits:
- name: train
num_examples: 1747
- name: test
num_examples: 436
---
# Sao Tome and Principe: Greenhouse Gas and Air Pollutant Emissions
**Publisher:** Climate TRACE · **Source:** [HDX](https://data.humdata.org/dataset/stp-climate-trace) · **License:** `cc-by` · **Updated:** 2026-03-30
---
## Abstract
Climate TRACE is a non-profit coalition of organizations building a timely, open, and accessible inventory of exactly where greenhouse gas emissions are coming from. Climate TRACE estimates greenhouse gas (GHG) and air pollutant emissions for over 2.7 million sources (from over 744 million assets), and every single country globally.
The Climate TRACE emissions inventory includes:
- Annual country-level emissions by sub-sector and by gas beginning in 2015
- Monthly source-level emissions by sub-sector and gas beginning in 2021 and confidence
- Emissions source ownership where and when available.
Each row in this dataset represents time-series observations. Data was last updated on HDX on 2026-03-30. Geographic scope: **STP**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Climate and environment |
| **Unit of observation** | Time-series observations |
| **Rows (total)** | 2,184 |
| **Columns** | 13 (4 numeric, 9 categorical, 0 datetime) |
| **Train split** | 1,747 rows |
| **Test split** | 436 rows |
| **Geographic scope** | STP |
| **Publisher** | Climate TRACE |
| **HDX last updated** | 2026-03-30 |
---
## Variables
**Geographic** — `year` (range 2024.0–2026.0), `emissionsquantity` (range 0.0–105.3959).
**Temporal** — `month` (range 1.0–12.0).
**Identifier / Metadata** — `full_name` (Sao Tome and Principe, São Tomé Municipality, STP, Príncipe Municipality, STP), `id` (STP, STP.2_1, STP.1_1), `level_0_id` (STP), `level_1_id` (STP.2_1, STP.1_1), `name` (Sao Tome and Principe, São Tomé Municipality, Príncipe Municipality) and 2 others.
**Other** — `level` (range 0.0–1.0), `sector` (transportation, waste, agriculture), `gas` (ch4).
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-stp-climate-trace")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `full_name` | object | 0.0% | Sao Tome and Principe, São Tomé Municipality, STP, Príncipe Municipality, STP |
| `id` | object | 0.0% | STP, STP.2_1, STP.1_1 |
| `level` | int64 | 0.0% | 0.0 – 1.0 (mean 0.663) |
| `level_0_id` | object | 0.0% | STP |
| `level_1_id` | object | 33.7% | STP.2_1, STP.1_1 |
| `name` | object | 0.0% | Sao Tome and Principe, São Tomé Municipality, Príncipe Municipality |
| `year` | int64 | 0.0% | 2024.0 – 2026.0 (mean 2024.6085) |
| `month` | int64 | 0.0% | 1.0 – 12.0 (mean 6.6882) |
| `sector` | object | 0.0% | transportation, waste, agriculture |
| `gas` | object | 0.0% | ch4 |
| `emissionsquantity` | float64 | 0.0% | 0.0 – 105.3959 (mean 6.1657) |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-04 |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `level` | 0.0 | 1.0 | 0.663 | 1.0 |
| `year` | 2024.0 | 2026.0 | 2024.6085 | 2025.0 |
| `month` | 1.0 | 12.0 | 6.6882 | 7.0 |
| `emissionsquantity` | 0.0 | 105.3959 | 6.1657 | 0.0071 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 1 column(s) with >80% missing values were removed: `level_2_id`. 2,922 exact duplicate rows were removed. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from Climate TRACE and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- The following columns have >20% missing values and should be treated with caution in modelling: `level_1_id`.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/stp-climate-trace) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_stp_climate_trace,
title = {Sao Tome and Principe: Greenhouse Gas and Air Pollutant Emissions},
author = {Climate TRACE},
year = {2026},
url = {https://data.humdata.org/dataset/stp-climate-trace},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
---
annotations_creators:
- 无注释
language_creators:
- 源自现有公开资源
language:
- 英语
license:
- cc-by-4.0
multilinguality:
- 单语言
size_categories:
- 1K<n<10K
source_datasets:
- 原创数据集
task_categories:
- 表格分类
- 表格回归
task_ids: []
tags:
- 非洲
- 人道主义
- HDX
- electric-sheep-africa
- 气候与天气
- 环境
- 兴趣点(points-of-interest-poi)
- stp
pretty_name: "圣多美和普林西比:温室气体与空气污染物排放"
dataset_info:
splits:
- name: 训练集
num_examples: 1747
- name: 测试集
num_examples: 436
---
# 圣多美和普林西比:温室气体与空气污染物排放
**发布方:** Climate TRACE · **数据源:** [HDX(人类数据交换平台)](https://data.humdata.org/dataset/stp-climate-trace) · **许可证:** `cc-by` · **更新时间:** 2026-03-30
---
## 摘要
Climate TRACE是由多家机构组成的非营利性联盟,致力于构建及时、开放且可获取的温室气体(Greenhouse Gas, GHG)排放来源精准清单。该联盟对全球所有国家的超270万个排放源(源自超7.44亿个资产)的温室气体与空气污染物排放进行估算。
Climate TRACE排放清单包含以下内容:
- 2015年起按细分行业与气体类型分类的年度国家级排放数据
- 2021年起按细分行业、气体类型及置信度分类的月度排放源级数据
- 可获取的排放源所有权信息。
本数据集的每一行均代表一组时序观测数据。该数据最后一次在HDX平台更新的时间为2026年3月30日。地理覆盖范围为**STP(圣多美和普林西比国家代码)**。
*本数据集已由[Electric Sheep Africa](https://huggingface.co/electricsheepafrica)整理为适配机器学习的Parquet格式。*
---
## 数据集特征
| | |
|---|---|
| **领域** | 气候与环境 |
| **观测单元** | 时序观测数据 |
| **总样本行数** | 2184 |
| **列数** | 13(4个数值型列,9个分类型列,0个日期时间型列) |
| **训练集划分** | 1747行 |
| **测试集划分** | 436行 |
| **地理覆盖范围** | STP |
| **发布方** | Climate TRACE |
| **HDX平台最后更新时间** | 2026-03-30 |
---
## 变量说明
**地理相关变量**:`year`(年份,取值范围2024.0–2026.0)、`emissionsquantity`(排放量,取值范围0.0–105.3959)。
**时间相关变量**:`month`(月份,取值范围1.0–12.0)。
**标识符与元数据变量**:`full_name`(完整名称,涵盖圣多美和普林西比、圣多美市、普林西比市等)、`id`(标识符,包含STP、STP.2_1、STP.1_1)、`level_0_id`(0级标识符,值为STP)、`level_1_id`(1级标识符,包含STP.2_1、STP.1_1)、`name`(名称,涵盖圣多美和普林西比、圣多美市、普林西比市)及另外2个字段。
**其他变量**:`level`(层级,取值范围0.0–1.0)、`sector`(行业领域,包含交通运输、废弃物处理、农业)、`gas`(气体类型,ch4,即甲烷)。
---
## 快速上手
python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-stp-climate-trace")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
---
## 数据结构
| 列名 | 数据类型 | 缺失率 | 取值范围/示例值 |
|---|---|---|---|
| `full_name` | 字符型 | 0.0% | 圣多美和普林西比、圣多美市、普林西比市 |
| `id` | 字符型 | 0.0% | STP、STP.2_1、STP.1_1 |
| `level` | 整型 | 0.0% | 0.0 – 1.0(均值0.663) |
| `level_0_id` | 字符型 | 0.0% | STP |
| `level_1_id` | 字符型 | 33.7% | STP.2_1、STP.1_1 |
| `name` | 字符型 | 0.0% | 圣多美和普林西比、圣多美市、普林西比市 |
| `year` | 整型 | 0.0% | 2024.0 – 2026.0(均值2024.6085) |
| `month` | 整型 | 0.0% | 1.0 – 12.0(均值6.6882) |
| `sector` | 字符型 | 0.0% | 交通运输、废弃物处理、农业 |
| `gas` | 字符型 | 0.0% | ch4(甲烷) |
| `emissionsquantity` | 浮点型 | 0.0% | 0.0 – 105.3959(均值6.1657) |
| `esa_source` | 字符型 | 0.0% | HDX |
| `esa_processed` | 字符型 | 0.0% | 2026-04-04 |
---
## 数值型变量统计
| 列名 | 最小值 | 最大值 | 均值 | 中位数 |
|---|---|---|---|---|
| `level` | 0.0 | 1.0 | 0.663 | 1.0 |
| `year` | 2024.0 | 2026.0 | 2024.6085 | 2025.0 |
| `month` | 1.0 | 12.0 | 6.6882 | 7.0 |
| `emissionsquantity` | 0.0 | 105.3959 | 6.1657 | 0.0071 |
---
## 数据整理流程
原始数据通过CKAN API从HDX平台下载,并转换为Parquet格式。所有列名均转换为小写并统一采用蛇形命名法(snake_case)。将常见缺失值标记(`N/A`、`null`、`none`、`-`、`unknown`、`no data`、`#N/A`)统一替换为`NaN`。移除了1列缺失率超过80%的字段`level_2_id`。删除了2922条完全重复的样本。采用固定随机种子(42)将数据集按80/20的比例划分为训练集与测试集,并保存为Snappy压缩格式的Parquet文件。
---
## 数据集局限性
- 数据源自Climate TRACE,未经过Electric Sheep Africa的独立验证。
- 自动化清洗流程无法修正原始数据集中的错报值、定义不一致或采样偏差问题。
- 以下列的缺失率超过20%,在建模过程中需谨慎使用:`level_1_id`。
- 请参阅[HDX原始数据集页面](https://data.humdata.org/dataset/stp-climate-trace)获取发布方提供的方法学说明与注意事项。
---
## 引用格式
bibtex
@dataset{hdx_africa_stp_climate_trace,
title = {圣多美和普林西比:温室气体与空气污染物排放},
author = {Climate TRACE},
year = {2026},
url = {https://data.humdata.org/dataset/stp-climate-trace},
note = {由Electric Sheep Africa(https://huggingface.co/electricsheepafrica)重新打包为机器学习可用格式}
}
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — 非洲的机器学习数据集基础设施。尼日利亚拉各斯。*
提供机构:
electricsheepafrica



