five

electricsheepafrica/africa-eri-climate-trace

收藏
Hugging Face2026-04-04 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-eri-climate-trace
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - no-annotation language_creators: - found language: - en license: cc-by-4.0 multilinguality: - monolingual size_categories: - 1K<n<10K source_datasets: - original task_categories: - tabular-classification - tabular-regression task_ids: [] tags: - africa - humanitarian - hdx - electric-sheep-africa - climate-weather - environment - points-of-interest-poi - eri pretty_name: "Eritrea: Greenhouse Gas and Air Pollutant Emissions" dataset_info: splits: - name: train num_examples: 3617 - name: test num_examples: 904 --- # Eritrea: Greenhouse Gas and Air Pollutant Emissions **Publisher:** Climate TRACE · **Source:** [HDX](https://data.humdata.org/dataset/eri-climate-trace) · **License:** `cc-by` · **Updated:** 2026-03-30 --- ## Abstract Climate TRACE is a non-profit coalition of organizations building a timely, open, and accessible inventory of exactly where greenhouse gas emissions are coming from. Climate TRACE estimates greenhouse gas (GHG) and air pollutant emissions for over 2.7 million sources (from over 744 million assets), and every single country globally. The Climate TRACE emissions inventory includes: - Annual country-level emissions by sub-sector and by gas beginning in 2015 - Monthly source-level emissions by sub-sector and gas beginning in 2021 and confidence - Emissions source ownership where and when available. Each row in this dataset represents time-series observations. Data was last updated on HDX on 2026-03-30. Geographic scope: **ERI**. *Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).* --- ## Dataset Characteristics | | | |---|---| | **Domain** | Climate and environment | | **Unit of observation** | Time-series observations | | **Rows (total)** | 4,522 | | **Columns** | 13 (4 numeric, 9 categorical, 0 datetime) | | **Train split** | 3,617 rows | | **Test split** | 904 rows | | **Geographic scope** | ERI | | **Publisher** | Climate TRACE | | **HDX last updated** | 2026-03-30 | --- ## Variables **Geographic** — `year` (range 2024.0–2026.0), `emissionsquantity` (range 0.0–15315.2041). **Temporal** — `month` (range 1.0–12.0). **Identifier / Metadata** — `full_name` (Eritrea, Semenawi Keyih Bahri Region, ERI, Maekel Region, ERI), `id` (ERI, ERI.2_1, ERI.1_1), `level_0_id` (ERI), `level_1_id` (ERI.2_1, ERI.1_1, ERI.4_1), `name` (Eritrea, Semenawi Keyih Bahri Region, Maekel Region) and 2 others. **Other** — `level` (range 0.0–1.0), `sector` (agriculture, buildings, waste), `gas` (ch4). --- ## Quick Start ```python from datasets import load_dataset ds = load_dataset("electricsheepafrica/africa-eri-climate-trace") train = ds["train"].to_pandas() test = ds["test"].to_pandas() print(train.shape) train.head() ``` --- ## Schema | Column | Type | Null % | Range / Sample Values | |---|---|---|---| | `full_name` | object | 0.0% | Eritrea, Semenawi Keyih Bahri Region, ERI, Maekel Region, ERI | | `id` | object | 0.0% | ERI, ERI.2_1, ERI.1_1 | | `level` | int64 | 0.0% | 0.0 – 1.0 (mean 0.8379) | | `level_0_id` | object | 0.0% | ERI | | `level_1_id` | object | 16.2% | ERI.2_1, ERI.1_1, ERI.4_1 | | `name` | object | 0.0% | Eritrea, Semenawi Keyih Bahri Region, Maekel Region | | `year` | int64 | 0.0% | 2024.0 – 2026.0 (mean 2024.607) | | `month` | int64 | 0.0% | 1.0 – 12.0 (mean 6.6875) | | `sector` | object | 0.0% | agriculture, buildings, waste | | `gas` | object | 0.0% | ch4 | | `emissionsquantity` | float64 | 0.0% | 0.0 – 15315.2041 (mean 349.2147) | | `esa_source` | object | 0.0% | HDX | | `esa_processed` | object | 0.0% | 2026-04-04 | --- ## Numeric Summary | Column | Min | Max | Mean | Median | |---|---|---|---|---| | `level` | 0.0 | 1.0 | 0.8379 | 1.0 | | `year` | 2024.0 | 2026.0 | 2024.607 | 2025.0 | | `month` | 1.0 | 12.0 | 6.6875 | 7.0 | | `emissionsquantity` | 0.0 | 15315.2041 | 349.2147 | 0.4305 | --- ## Curation Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 1 column(s) with >80% missing values were removed: `level_2_id`. 7,392 exact duplicate rows were removed. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet. --- ## Limitations - Data originates from Climate TRACE and has not been independently validated by ESA. - Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection. - Refer to the [original HDX dataset page](https://data.humdata.org/dataset/eri-climate-trace) for the publisher's own methodology notes and caveats. --- ## Citation ```bibtex @dataset{hdx_africa_eri_climate_trace, title = {Eritrea: Greenhouse Gas and Air Pollutant Emissions}, author = {Climate TRACE}, year = {2026}, url = {https://data.humdata.org/dataset/eri-climate-trace}, note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)} } ``` --- *[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica
搜集汇总
数据集介绍
main_image_url
构建方式
在气候科学领域,构建高质量排放数据集对于精准评估环境政策至关重要。该数据集由Climate TRACE联盟系统整合全球温室气体与空气污染物排放源,覆盖超过270万个排放点,并针对厄立特里亚地区进行地理聚焦。原始数据通过HDX平台的CKAN接口获取,经过自动化清洗流程,统一了缺失值标记并移除了重复记录,最终转化为适合机器学习处理的Parquet格式,同时按照80:20比例划分训练集与测试集,确保了数据的结构规整与模型验证的可靠性。
特点
该数据集在环境监测领域展现出独特的时空维度特征,以时间序列形式记录了2024至2026年间厄立特里亚区域的分部门排放动态。其核心价值体现在13个结构化变量中,涵盖地理标识、时间戳、排放部门与气体类型等多重维度,其中排放量数值跨度从零至万余单位,精确反映了区域排放强度分布。数据集通过层级化地理编码实现了国家与次区域级别的嵌套关联,并针对农业、建筑、废弃物等关键部门进行甲烷排放追踪,为多尺度环境建模提供了细粒度数据支撑。
使用方法
在环境数据分析实践中,该数据集可通过Hugging Face生态工具链实现高效调用。研究者只需使用datasets库的load_dataset函数加载指定路径,即可将数据转换为Pandas DataFrame进行探索性分析。数据集已预置训练与测试分割,支持直接应用于表格分类或回归任务,例如基于排放部门与时间特征的温室气体预测建模。用户可结合地理层级标识开展空间异质性研究,或利用月份变量分析排放季节性规律,但需注意原始数据存在未经验证的限制,建议交叉参考Climate TRACE官方方法论文档以确保分析严谨性。
背景与挑战
背景概述
在应对全球气候变化的紧迫背景下,温室气体与空气污染物排放数据的精确追踪与建模成为环境科学领域的核心议题。由非营利联盟Climate TRACE主导构建的排放清单数据集,旨在通过整合全球超过270万个排放源的时序观测数据,为政策制定与科学研究提供透明、及时且可访问的量化依据。该数据集聚焦于厄立特里亚(ERI)的区域排放特征,涵盖了农业、建筑与废弃物等关键部门的甲烷排放记录,并由Electric Sheep Africa机构于2026年进行机器学习适配性重构,以结构化格式支持回归与分类任务,为区域环境治理与气候模型验证提供了重要的数据基础。
当前挑战
该数据集致力于解决区域温室气体排放动态监测与源解析的复杂问题,其核心挑战在于如何从高度异质且碎片化的原始观测中提取可靠的时间序列模式,并准确量化不同部门与气体种类的贡献度。在构建过程中,数据清洗面临严峻考验,包括处理大量缺失值、统一不一致的标记系统以及剔除近七千条重复记录,同时需保持与原始数据源的方法论一致性。此外,排放量的估算依赖于间接建模与遥感技术,可能存在定义偏差与采样局限,这要求后续分析必须谨慎考虑数据的不确定性边界。
常用场景
经典使用场景
在气候变化与环境科学领域,该数据集为研究人员提供了厄立特里亚温室气体与空气污染物排放的时序观测数据。其经典使用场景在于构建机器学习模型,以预测不同部门(如农业、建筑、废弃物处理)的甲烷排放趋势。通过整合年份、月份及排放量等变量,学者能够训练回归或分类模型,分析排放模式与季节性变化之间的关联,从而揭示区域环境动态。
衍生相关工作
围绕该数据集衍生的经典工作包括基于机器学习的排放预测框架与区域气候风险评估模型。研究人员利用其时序特征开发了长短期记忆网络(LSTM)与随机森林算法,以模拟未来排放情景。同时,结合地理信息系统(GIS)的扩展研究,进一步探索了排放空间异质性,为全球气候数据库的构建提供了方法论参考。
数据集最近研究
最新研究方向
在气候科学与人道主义数据交叉领域,该数据集聚焦于厄立特里亚温室气体与空气污染物排放的精细化时空建模。前沿研究正利用此类高分辨率排放清单,结合机器学习方法,探索非洲地区农业、建筑与废弃物处理等关键部门的甲烷排放动态。随着全球气候行动对发展中国家排放透明度的需求日益增长,该数据集为评估区域气候政策成效、追踪联合国可持续发展目标进展提供了实证基础。相关研究不仅深化了对东非地区碳排放源的理解,也为国际气候融资与适应性规划提供了数据驱动的决策支持。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作