five

electricsheepafrica/africa-east-africa-chirps-seasonal-rainfall-accumulation-anomaly-by-pentad

收藏
Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-east-africa-chirps-seasonal-rainfall-accumulation-anomaly-by-pentad
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - no-annotation language_creators: - found language: - en license: cc-by-4.0 multilinguality: - monolingual size_categories: - 1K<n<10K source_datasets: - original task_categories: - tabular-classification - tabular-regression - other task_ids: [] tags: - africa - humanitarian - hdx - electric-sheep-africa - climate-weather - hazards-and-risk - eth - ken - som pretty_name: "East Africa - CHIRPS Seasonal Rainfall Accumulation Anomaly by Pentad" dataset_info: splits: - name: train num_examples: 2121 - name: test num_examples: 530 --- # East Africa - CHIRPS Seasonal Rainfall Accumulation Anomaly by Pentad **Publisher:** HDX · **Source:** [HDX](https://data.humdata.org/dataset/east-africa-chirps-seasonal-rainfall-accumulation-anomaly-by-pentad) · **License:** `cc-by` · **Updated:** 2025-09-12 --- ## Abstract Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) is a 35+ year quasi-global rainfall data set. It is a gridded rainfall time series for trend analysis and seasonal drought monitoring, spans 50°S-50°N (and all longitudes) and ranges from 1981 to near-present. The anomaly refers to the difference between current rainfall and the average rainfall that occurred between 1981 and 2010 in millimeters. For more information visit the [CHIRPS site](https://www.chc.ucsb.edu/data/chirps). This dataset contains the latest available CHIRPS anomaly data. The full list of data available is available from USGS for [Mar-May data](https://edcintl.cr.usgs.gov/downloads/sciweb1/shared/fews/web/africa/east/pentadal/chirps/seasaccum/marmay/anom/lta/downloads/pentadal/), [Oct-Dec data](https://edcintl.cr.usgs.gov/downloads/sciweb1/shared/fews/web/africa/east/pentadal/chirps/seasaccum/octdec/anom/lta/downloads/pentadal/), and others. Additionally, subnational statistics (mean, min, max) have been calculated for Ethiopia, Kenya, and Somalia and are available in the csv resource. Each row in this dataset represents tabular records. Data was last updated on HDX on 2025-09-12. Geographic scope: **ETH, KEN, SOM**. *Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).* --- ## Dataset Characteristics | | | |---|---| | **Domain** | Climate and environment | | **Unit of observation** | Tabular records | | **Rows (total)** | 2,652 | | **Columns** | 18 (4 numeric, 14 categorical, 0 datetime) | | **Train split** | 2,121 rows | | **Test split** | 530 rows | | **Geographic scope** | ETH, KEN, SOM | | **Publisher** | HDX | | **HDX last updated** | 2025-09-12 | --- ## Variables **Geographic** — `chirps_max` (range -148.0575–1502.7864). **Temporal** — `season`. **Identifier / Metadata** — `adm0_pcode` (ET, SO, KE), `adm0_ref` (Ethiopia, Somalia, Kenya), `adm1_pcode` (ET04, ET03, ET07), `adm1_ref` (Oromia, Amhara, SNNP), `adm2_pcode` (ET1600, ET0305, ET0304) and 7 others. **Other** — `alpha_3` (ETH, SOM, KEN), `adm_level` (range 1.0–3.0), `chirps_mean` (range -148.0575–837.3773), `chirps_min` (range -284.2439–544.8214). --- ## Quick Start ```python from datasets import load_dataset ds = load_dataset("electricsheepafrica/africa-east-africa-chirps-seasonal-rainfall-accumulation-anomaly-by-pentad") train = ds["train"].to_pandas() test = ds["test"].to_pandas() print(train.shape) train.head() ``` --- ## Schema | Column | Type | Null % | Range / Sample Values | |---|---|---|---| | `alpha_3` | object | 0.0% | ETH, SOM, KEN | | `adm0_pcode` | object | 0.0% | ET, SO, KE | | `adm0_ref` | object | 0.0% | Ethiopia, Somalia, Kenya | | `adm1_pcode` | object | 0.0% | ET04, ET03, ET07 | | `adm1_ref` | object | 0.0% | Oromia, Amhara, SNNP | | `adm2_pcode` | object | 5.9% | ET1600, ET0305, ET0304 | | `adm2_ref` | object | 5.9% | Sidama, North Shewa (AM), South Wello | | `adm3_pcode` | object | 18.4% | ET010101, ET050503, ET050392 | | `adm3_ref` | object | 18.4% | Tahtay Adiyabo, Shilabo, Daror | | `adm_level` | int64 | 0.0% | 1.0 – 3.0 (mean 2.7572) | | `adm_pcode` | object | 0.0% | ET010101, ET070706, ET070704 | | `adm_ref` | object | 0.0% | | | `season` | object | 0.0% | | | `chirps_mean` | float64 | 10.5% | -148.0575 – 837.3773 (mean 69.3937) | | `chirps_min` | float64 | 10.5% | -284.2439 – 544.8214 (mean 32.4136) | | `chirps_max` | float64 | 10.5% | -148.0575 – 1502.7864 (mean 113.4136) | | `esa_source` | object | 0.0% | | | `esa_processed` | object | 0.0% | | --- ## Numeric Summary | Column | Min | Max | Mean | Median | |---|---|---|---|---| | `adm_level` | 1.0 | 3.0 | 2.7572 | 3.0 | | `chirps_mean` | -148.0575 | 837.3773 | 69.3937 | 29.6887 | | `chirps_min` | -284.2439 | 544.8214 | 32.4136 | 5.2211 | | `chirps_max` | -148.0575 | 1502.7864 | 113.4136 | 64.4659 | --- ## Curation Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet. --- ## Limitations - Data originates from HDX and has not been independently validated by ESA. - Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection. - This dataset spans 3 countries; geographic and methodological inconsistencies across national boundaries may affect cross-country comparability. - Refer to the [original HDX dataset page](https://data.humdata.org/dataset/east-africa-chirps-seasonal-rainfall-accumulation-anomaly-by-pentad) for the publisher's own methodology notes and caveats. --- ## Citation ```bibtex @dataset{hdx_africa_east_africa_chirps_seasonal_rainfall_accumulation_anomaly_by_pentad, title = {East Africa - CHIRPS Seasonal Rainfall Accumulation Anomaly by Pentad}, author = {HDX}, year = {2025}, url = {https://data.humdata.org/dataset/east-africa-chirps-seasonal-rainfall-accumulation-anomaly-by-pentad}, note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)} } ``` --- *[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*

annotations_creators: - 无注释 language_creators: - 现有资源获取 language: - 英语 license: CC BY 4.0 multilinguality: - 单语言 size_categories: - 1000 < 样本数 < 10000 source_datasets: - 原创数据集 task_categories: - 表格分类 - 表格回归 - 其他 task_ids: [] tags: - 非洲 - 人道主义 - HDX - Electric Sheep Africa - 气候与气象 - 灾害与风险 - 埃塞俄比亚(ETH) - 肯尼亚(KEN) - 索马里(SOM) pretty_name: "东非——按五日合成期划分的CHIRPS季节降雨累积距平数据集" dataset_info: splits: - name: train num_examples: 2121 - name: test num_examples: 530 # 东非——按五日合成期划分的CHIRPS季节降雨累积距平数据集 **发布方:HDX(人类数据交换平台)** · **数据源:** [HDX平台数据集页面](https://data.humdata.org/dataset/east-africa-chirps-seasonal-rainfall-accumulation-anomaly-by-pentad) · **许可协议:** `CC BY` · **最后更新:** 2025-09-12 --- ## 摘要 气候灾害小组红外降水与台站数据(Climate Hazards Group InfraRed Precipitation with Station data, CHIRPS)是一套拥有35年以上历史的准全球降雨数据集。该数据集为网格化降雨时间序列,可用于趋势分析与季节性干旱监测,覆盖南纬50°至北纬50°的所有经度范围,时间跨度为1981年至今。此处的距平指当前降雨量与1981-2010年基准期平均降雨量的差值,单位为毫米。更多信息可访问[CHIRPS官方网站](https://www.chc.ucsb.edu/data/chirps)。 本数据集包含最新可用的CHIRPS降雨距平数据。完整可用数据集可通过美国地质调查局(USGS)获取:[3-5月数据](https://edcintl.cr.usgs.gov/downloads/sciweb1/shared/fews/web/africa/east/pentadal/chirps/seasaccum/marmay/anom/lta/downloads/pentadal/)、[10-12月数据](https://edcintl.cr.usgs.gov/downloads/sciweb1/shared/fews/web/africa/east/pentadal/chirps/seasaccum/octdec/anom/lta/downloads/pentadal/)及其他时段数据。 此外,本数据集已针对埃塞俄比亚、肯尼亚与索马里计算了次级行政单元的统计量(均值、最小值、最大值),并以CSV格式资源提供。 本数据集每条记录对应一条表格行数据。数据集最后更新时间为2025年9月12日(HDX平台)。地理覆盖范围:**埃塞俄比亚(ETH)、肯尼亚(KEN)、索马里(SOM)**。 *由[Electric Sheep Africa](https://huggingface.co/electricsheepafrica)整理为机器学习可用的Parquet格式。* --- ## 数据集特征 | | | |---|---| | **领域** | 气候与环境 | | **观测单元** | 表格记录 | | **总记录数** | 2652条 | | **字段数** | 18个(4个数值型、14个分类型、0个日期时间型) | | **训练集规模** | 2121条 | | **测试集规模** | 530条 | | **地理覆盖范围** | 埃塞俄比亚、肯尼亚、索马里 | | **发布方** | HDX | | **HDX平台最后更新时间** | 2025-09-12 | --- ## 字段说明 **地理相关字段**:`chirps_max`(取值范围:-148.0575~1502.7864)。 **时间相关字段**:`season`(季节)。 **标识与元数据字段**:`adm0_pcode`(取值:ET、SO、KE)、`adm0_ref`(对应国家:埃塞俄比亚、索马里、肯尼亚)、`adm1_pcode`(取值:ET04、ET03、ET07)、`adm1_ref`(对应一级行政区:奥罗米亚、阿姆哈拉、SNNP)、`adm2_pcode`(取值:ET1600、ET0305、ET0304等,共7个附加字段)。 **其他字段**:`alpha_3`(国家代码:ETH、SOM、KEN)、`adm_level`(行政层级,取值范围:1.0~3.0)、`chirps_mean`(降雨距平均值,取值范围:-148.0575~837.3773)、`chirps_min`(降雨距平最小值,取值范围:-284.2439~544.8214)。 --- ## 快速上手 python from datasets import load_dataset ds = load_dataset("electricsheepafrica/africa-east-africa-chirps-seasonal-rainfall-accumulation-anomaly-by-pentad") train = ds["train"].to_pandas() test = ds["test"].to_pandas() print(train.shape) train.head() --- ## 字段Schema | 字段名 | 数据类型 | 空值占比 | 取值范围/示例值 | |---|---|---|---| | `alpha_3` | 字符串(object) | 0.0% | ETH, SOM, KEN | | `adm0_pcode` | 字符串(object) | 0.0% | ET, SO, KE | | `adm0_ref` | 字符串(object) | 0.0% | 埃塞俄比亚、索马里、肯尼亚 | | `adm1_pcode` | 字符串(object) | 0.0% | ET04, ET03, ET07 | | `adm1_ref` | 字符串(object) | 0.0% | 奥罗米亚、阿姆哈拉、SNNP | | `adm2_pcode` | 字符串(object) | 5.9% | ET1600, ET0305, ET0304 | | `adm2_ref` | 字符串(object) | 5.9% | 锡达马、北绍阿(阿姆哈拉州)、南韦洛 | | `adm3_pcode` | 字符串(object) | 18.4% | ET010101, ET050503, ET050392 | | `adm3_ref` | 字符串(object) | 18.4% | 塔哈泰阿迪亚博、希拉博、达罗尔 | | `adm_level` | 整数(int64) | 0.0% | 1.0~3.0 | | `adm_pcode` | 字符串(object) | 0.0% | ET010101, ET070706, ET070704 | | `adm_ref` | 字符串(object) | 0.0% | 无 | | `season` | 字符串(object) | 0.0% | 无 | | `chirps_mean` | 浮点数(float64) | 10.5% | -148.0575~837.3773(均值:69.3937) | | `chirps_min` | 浮点数(float64) | 10.5% | -284.2439~544.8214(均值:32.4136) | | `chirps_max` | 浮点数(float64) | 10.5% | -148.0575~1502.7864(均值:113.4136) | | `esa_source` | 字符串(object) | 0.0% | 无 | | `esa_processed` | 字符串(object) | 0.0% | 无 | --- ## 数值型字段统计摘要 | 字段名 | 最小值 | 最大值 | 均值 | 中位数 | |---|---|---|---|---| | `adm_level` | 1.0 | 3.0 | 2.7572 | 3.0 | | `chirps_mean` | -148.0575 | 837.3773 | 69.3937 | 29.6887 | | `chirps_min` | -284.2439 | 544.8214 | 32.4136 | 5.2211 | | `chirps_max` | -148.0575 | 1502.7864 | 113.4136 | 64.4659 | --- ## 数据整理流程 原始数据通过CKAN API从HDX平台下载,并转换为Parquet格式。字段名均转为小写并标准化为蛇形命名法。常见空值标记(`N/A`、`null`、`none`、`-`、`unknown`、`no data`、`#N/A`)均统一替换为`NaN`。本数据集以固定随机种子(42)按80/20比例划分为训练集与测试集,并以Snappy压缩的Parquet格式存储。 --- ## 局限性说明 - 数据源自HDX平台,未经过Electric Sheep Africa的独立验证。 - 自动化清洗流程无法修正原始数据集中的错误上报值、定义不一致或采样偏差问题。 - 本数据集覆盖3个国家,国界间的地理与方法学差异可能影响跨国可比性。 - 请参阅[HDX原始数据集页面](https://data.humdata.org/dataset/east-africa-chirps-seasonal-rainfall-accumulation-anomaly-by-pentad)获取发布方提供的方法说明与注意事项。 --- ## 引用格式 bibtex @dataset{hdx_africa_east_africa_chirps_seasonal_rainfall_accumulation_anomaly_by_pentad, title = {East Africa - CHIRPS Seasonal Rainfall Accumulation Anomaly by Pentad}, author = {HDX}, year = {2025}, url = {https://data.humdata.org/dataset/east-africa-chirps-seasonal-rainfall-accumulation-anomaly-by-pentad}, note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)} } --- *[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — 非洲机器学习数据集基础设施。尼日利亚拉各斯。*
提供机构:
electricsheepafrica
二维码
社区交流群
二维码
科研交流群
商业服务