electricsheepafrica/africa-cod-rainfall-subnational
收藏Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-cod-rainfall-subnational
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- 100K<n<1M
source_datasets:
- original
task_categories:
- tabular-regression
- other
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- climate-weather
- environment
- cod
pretty_name: "Democratic Republic of the Congo: Rainfall Indicators at Subnational Level"
dataset_info:
splits:
- name: train
num_examples: 322088
- name: test
num_examples: 80522
---
# Democratic Republic of the Congo: Rainfall Indicators at Subnational Level
**Publisher:** WFP - World Food Programme · **Source:** [HDX](https://data.humdata.org/dataset/cod-rainfall-subnational) · **License:** `cc-by` · **Updated:** 2026-04-03
---
## Abstract
This dataset contains dekadal rainfall indicators, computed from Climate Hazards Group InfraRed Precipitation satellite imagery with insitu Station data (CHIRPS) version 2 and the CHIRPS-GEFS short term rainfall forecasts, aggregated by subnational administrative units.
Included indicators are (for each dekad):
- 10 day rainfall [mm] (`rfh`)
- rainfall 1-month rolling aggregation [mm] (`r1h`)
- rainfall 3-month rolling aggregation [mm] (`r3h`)
- rainfall long term average [mm] (`rfh_avg`)
- rainfall 1-month rolling aggregation long term average [mm] (`r1h_avg`)
- rainfall 3-month rolling aggregation long term average [mm] (`r3h_avg`)
- rainfall anomaly [%] (`rfq`)
- rainfall 1-month anomaly [%] (`r1q`)
- rainfall 3-month anomaly [%] (`r3q`)
The administrative units used for aggregation are based on WFP data and contain a Pcode reference attributed to each unit. The number of input pixels used to create the aggregates, is provided in the `n_pixels` column. Finally, the `type` column indicates if the value is based on a forecast, a preliminary or a final product.
Forecasts are issued on the 6th, 16th, and 26th of each month for the upcoming 10-day period (dekad), then updated with improved versions on the 1st, 11th, and 21st.
Preliminary observations replace the previous dekad’s forecast on the 3rd, 13th, and 23rd, and are later replaced by final observations—published mid-month (13th or 23rd)—covering all three dekads of the prior month. Please find a summary below:
Publication Day: Forecast type, Covers (Dekad)
- 1st: Updated forecast, 1–10 of the same month
- 6th: Initial forecast, 11–20 of the same month
- 11th: Updated forecast, 1–10 of the same month
- 16th: Initial forecast, 21–end of the same month
- 21st: Updated forecast, 11–20 of the same month
- 26th: Initial forecast, 1–10 of the following month
For more on CHIRPS-GEFS forecasts, see: https://www.chc.ucsb.edu/data/chirps-gefs
For further details, please see the methodology section.
Each row in this dataset represents time-series observations. Temporal coverage is indicated by the `date` column(s). Geographic scope: **COD**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Climate and environment |
| **Unit of observation** | Time-series observations |
| **Rows (total)** | 402,610 |
| **Columns** | 17 (12 numeric, 4 categorical, 1 datetime) |
| **Train split** | 322,088 rows |
| **Test split** | 80,522 rows |
| **Geographic scope** | COD |
| **Publisher** | WFP - World Food Programme |
| **HDX last updated** | 2026-04-03 |
---
## Variables
**Geographic** — `n_pixels` (range 1.0–6524.0).
**Temporal** — `date`.
**Identifier / Metadata** — `adm_id` (range 900143.0–1011057.0), `pcode` (CD5407, CD62, CD8302), `esa_source` (HDX), `esa_processed` (2026-04-06).
**Other** — `adm_level` (range 1.0–2.0), `rfh` (range 0.0–274.0), `rfh_avg` (range 0.0–115.6653), `r1h` (range 0.0–544.0), `r1h_avg` (range 0.0–304.4258) and 6 others.
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-cod-rainfall-subnational")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `date` | datetime64[ns] | 0.0% | |
| `adm_level` | int64 | 0.0% | 1.0 – 2.0 (mean 1.8826) |
| `adm_id` | int64 | 0.0% | 900143.0 – 1011057.0 (mean 991009.085) |
| `pcode` | object | 0.0% | CD5407, CD62, CD8302 |
| `n_pixels` | float64 | 0.0% | 1.0 – 6524.0 (mean 608.0243) |
| `rfh` | float64 | 0.0% | 0.0 – 274.0 (mean 42.9547) |
| `rfh_avg` | float64 | 0.0% | 0.0 – 115.6653 (mean 42.7433) |
| `r1h` | float64 | 0.1% | 0.0 – 544.0 (mean 128.8135) |
| `r1h_avg` | float64 | 0.1% | 0.0 – 304.4258 (mean 128.2022) |
| `r3h` | float64 | 0.5% | 0.0 – 1095.5835 (mean 386.2572) |
| `r3h_avg` | float64 | 0.5% | 0.0051 – 827.0353 (mean 384.3607) |
| `rfq` | float64 | 0.0% | 10.808 – 632.5679 (mean 100.626) |
| `r1q` | float64 | 0.1% | 8.94 – 525.1832 (mean 100.616) |
| `r3q` | float64 | 0.5% | 13.5225 – 522.8425 (mean 100.6275) |
| `version` | object | 0.0% | final, prelim, forecast |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-06 |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `adm_level` | 1.0 | 2.0 | 1.8826 | 2.0 |
| `adm_id` | 900143.0 | 1011057.0 | 991009.085 | 1002957.0 |
| `n_pixels` | 1.0 | 6524.0 | 608.0243 | 358.0 |
| `rfh` | 0.0 | 274.0 | 42.9547 | 44.0 |
| `rfh_avg` | 0.0 | 115.6653 | 42.7433 | 48.2167 |
| `r1h` | 0.0 | 544.0 | 128.8135 | 140.15 |
| `r1h_avg` | 0.0 | 304.4258 | 128.2022 | 146.4885 |
| `r3h` | 0.0 | 1095.5835 | 386.2572 | 422.0 |
| `r3h_avg` | 0.0051 | 827.0353 | 384.3607 | 429.6826 |
| `rfq` | 10.808 | 632.5679 | 100.626 | 98.1046 |
| `r1q` | 8.94 | 525.1832 | 100.616 | 99.1631 |
| `r3q` | 13.5225 | 522.8425 | 100.6275 | 99.8921 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 1 column(s) were cast from string to numeric or datetime based on parse-success rate (>85% threshold). The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from WFP - World Food Programme and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/cod-rainfall-subnational) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_cod_rainfall_subnational,
title = {Democratic Republic of the Congo: Rainfall Indicators at Subnational Level},
author = {WFP - World Food Programme},
year = {2026},
url = {https://data.humdata.org/dataset/cod-rainfall-subnational},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
annotations_creators: 注释创建者:无注释
language_creators: 语言创建者:获取型
language: 语言:英语
license: 许可证:CC-BY-4.0
multilinguality: 多语言属性:单语种
size_categories: 样本量范围:100,000 < n < 1,000,000
source_datasets: 源数据集:原创数据集
task_categories: 任务类别:表格回归、其他
task_ids: 任务子项:无
tags: 标签:非洲、人道主义、人道主义数据交换平台(Humanitarian Data Exchange, HDX)、Electric Sheep Africa、气候与天气、环境、刚果民主共和国(ISO代码:COD)
pretty_name: "刚果民主共和国:省级以下行政单元降雨指标"
# 刚果民主共和国:省级以下行政单元降雨指标
**发布方:世界粮食计划署(World Food Programme, WFP)** · **来源:[人道主义数据交换平台(Humanitarian Data Exchange, HDX)](https://data.humdata.org/dataset/cod-rainfall-subnational)** · **许可证:`cc-by`** · **更新时间:2026-04-03**
---
## 摘要
本数据集包含旬度降雨指标,基于结合原位站数据的气候灾害组红外降水(Climate Hazards Group InfraRed Precipitation, CHIRPS)版本2卫星影像,以及CHIRPS-GEFS短期降雨预报数据计算得到,并按省级以下行政单元进行聚合。
包含的旬度指标如下:
- 10日降雨量(单位:毫米,标识符:`rfh`)
- 1个月滑动聚合降雨量(单位:毫米,标识符:`r1h`)
- 3个月滑动聚合降雨量(单位:毫米,标识符:`r3h`)
- 长期平均降雨量(单位:毫米,标识符:`rfh_avg`)
- 1个月滑动聚合降雨量长期平均值(单位:毫米,标识符:`r1h_avg`)
- 3个月滑动聚合降雨量长期平均值(单位:毫米,标识符:`r3h_avg`)
- 降雨量距平率(单位:百分比,标识符:`rfq`)
- 1个月滑动聚合降雨量距平率(单位:百分比,标识符:`r1q`)
- 3个月滑动聚合降雨量距平率(单位:百分比,标识符:`r3q`)
用于聚合的行政单元基于世界粮食计划署的数据,每个单元均配有Pcode标识码。用于生成聚合数据的输入像素数量已在`n_pixels`列中给出。最后,`version`列(原文暂用`type`表述)用于标识数据类型为预报、初步观测或最终观测结果。
预报于每月6日、16日、26日发布,覆盖未来10天(旬),并于每月1日、11日、21日发布更新后的改进版本。初步观测数据将于每月3日、13日、23日替换上一旬的预报数据,随后最终观测数据(于月中13日或23日发布)将替换初步观测数据,覆盖上月全部三个旬的数据。具体发布规则总结如下:
发布日期:预报类型,覆盖旬段
- 1日:更新版预报,覆盖当月1-10日
- 6日:初始预报,覆盖当月11-20日
- 11日:更新版预报,覆盖当月1-10日
- 16日:初始预报,覆盖当月21日至月末
- 21日:更新版预报,覆盖当月11-20日
- 26日:初始预报,覆盖次月1-10日
如需了解CHIRPS-GEFS预报的更多细节,请访问:https://www.chc.ucsb.edu/data/chirps-gefs
如需更多细节,请参阅方法学章节。
本数据集的每一行均代表时序观测数据,时间覆盖范围由`date`列标注。地理覆盖范围:**刚果民主共和国(ISO代码:COD)**。
*本数据集已由[Electric Sheep Africa](https://huggingface.co/electricsheepafrica)整理为适用于机器学习的Parquet格式。*
---
## 数据集特征
| 指标项 | 详情 |
|---|---|
| **领域** | 气候与环境 |
| **观测单元** | 时序观测数据 |
| **总样本行数** | 402,610 |
| **列数** | 17列(12个数值列、4个分类列、1个日期时间列) |
| **训练集划分** | 322,088行 |
| **测试集划分** | 80,522行 |
| **地理覆盖范围** | 刚果民主共和国(ISO代码:COD) |
| **发布方** | 世界粮食计划署(World Food Programme, WFP) |
| **HDX平台最后更新时间** | 2026-04-03 |
---
## 变量说明
**地理类变量**:`n_pixels`(取值范围:1.0–6524.0)。
**时间类变量**:`date`。
**标识符/元数据类变量**:`adm_id`(取值范围:900143.0–1011057.0)、`pcode`(示例值:CD5407、CD62、CD8302)、`esa_source`(取值:HDX)、`esa_processed`(处理时间:2026-04-06)。
**其他变量**:`adm_level`(取值范围:1.0–2.0)、`rfh`(取值范围:0.0–274.0)、`rfh_avg`(取值范围:0.0–115.6653)、`r1h`(取值范围:0.0–544.0)、`r1h_avg`(取值范围:0.0–304.4258),另有6个同类变量。
---
## 快速上手
python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-cod-rainfall-subnational")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
---
## 数据结构
| 列名 | 数据类型 | 空值占比 | 取值范围/示例值 |
|---|---|---|---|
| `date` | datetime64[ns] | 0.0% | 无 |
| `adm_level` | int64 | 0.0% | 1.0 – 2.0(均值:1.8826) |
| `adm_id` | int64 | 0.0% | 900143.0 – 1011057.0(均值:991009.085) |
| `pcode` | object | 0.0% | CD5407、CD62、CD8302 |
| `n_pixels` | float64 | 0.0% | 1.0 – 6524.0(均值:608.0243) |
| `rfh` | float64 | 0.0% | 0.0 – 274.0(均值:42.9547) |
| `rfh_avg` | float64 | 0.0% | 0.0 – 115.6653(均值:42.7433) |
| `r1h` | float64 | 0.1% | 0.0 – 544.0(均值:128.8135) |
| `r1h_avg` | float64 | 0.1% | 0.0 – 304.4258(均值:128.2022) |
| `r3h` | float64 | 0.5% | 0.0 – 1095.5835(均值:386.2572) |
| `r3h_avg` | float64 | 0.5% | 0.0051 – 827.0353(均值:384.3607) |
| `rfq` | float64 | 0.0% | 10.808 – 632.5679(均值:100.626) |
| `r1q` | float64 | 0.1% | 8.94 – 525.1832(均值:100.616) |
| `r3q` | float64 | 0.5% | 13.5225 – 522.8425(均值:100.6275) |
| `version` | object | 0.0% | final、prelim、forecast |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-06 |
---
## 数值统计摘要
| 列名 | 最小值 | 最大值 | 均值 | 中位数 |
|---|---|---|---|---|
| `adm_level` | 1.0 | 2.0 | 1.8826 | 2.0 |
| `adm_id` | 900143.0 | 1011057.0 | 991009.085 | 1002957.0 |
| `n_pixels` | 1.0 | 6524.0 | 608.0243 | 358.0 |
| `rfh` | 0.0 | 274.0 | 42.9547 | 44.0 |
| `rfh_avg` | 0.0 | 115.6653 | 42.7433 | 48.2167 |
| `r1h` | 0.0 | 544.0 | 128.8135 | 140.15 |
| `r1h_avg` | 0.0 | 304.4258 | 128.2022 | 146.4885 |
| `r3h` | 0.0 | 1095.5835 | 386.2572 | 422.0 |
| `r3h_avg` | 0.0051 | 827.0353 | 384.3607 | 429.6826 |
| `rfq` | 10.808 | 632.5679 | 100.626 | 98.1046 |
| `r1q` | 8.94 | 525.1832 | 100.616 | 99.1631 |
| `r3q` | 13.5225 | 522.8425 | 100.6275 | 99.8921 |
---
## 数据整理流程
原始数据通过CKAN API从人道主义数据交换平台(HDX)下载,并转换为Parquet格式。列名已统一转换为小写并适配蛇形命名法(snake_case)。常见缺失值标记(`N/A`、`null`、`none`、`-`、`unknown`、`no data`、`#N/A`)已统一替换为`NaN`。基于解析成功率(阈值>85%),将1列从字符串类型转换为数值或日期时间类型。本数据集以80/20的比例划分为训练集与测试集,使用固定随机种子(42)进行划分,并以Snappy压缩的Parquet格式存储。
---
## 局限性说明
- 数据来源于世界粮食计划署(World Food Programme, WFP),尚未由Electric Sheep Africa进行独立验证。
- 自动化数据清洗无法修正原始数据集中的错误报告值、定义不一致性或采样偏差。
- 如需了解发布方的方法学说明与注意事项,请参阅[HDX原始数据集页面](https://data.humdata.org/dataset/cod-rainfall-subnational)。
---
## 引用格式
bibtex
@dataset{hdx_africa_cod_rainfall_subnational,
title = {Democratic Republic of the Congo: Rainfall Indicators at Subnational Level},
author = {WFP - World Food Programme},
year = {2026},
url = {https://data.humdata.org/dataset/cod-rainfall-subnational},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — 非洲机器学习数据集基础设施,尼日利亚拉各斯。*
提供机构:
electricsheepafrica



