electricsheepafrica/africa-somalia-pin-targeted-reached-by-location-and-cluster
收藏Hugging Face2026-04-11 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-somalia-pin-targeted-reached-by-location-and-cluster
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- n<1K
source_datasets:
- original
task_categories:
- tabular-classification
- other
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- affected-population
- drought
- hxl
- people-in-need-pin
- som
pretty_name: "Somalia Drought Related - People Affected, Targeted & Reached by Location"
dataset_info:
splits:
- name: train
num_examples: 60
- name: test
num_examples: 15
---
# Somalia Drought Related - People Affected, Targeted & Reached by Location
**Publisher:** OCHA Regional Office for Southern and Eastern Africa (ROSEA) · **Source:** [HDX](https://data.humdata.org/dataset/somalia-pin-targeted-reached-by-location-and-cluster) · **License:** `cc-by` · **Updated:** 2025-09-16
---
## Abstract
Drought affected areas and population in Somalia
Each row in this dataset represents tabular records. Data was last updated on HDX on 2025-09-16. Geographic scope: **SOM**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Natural hazards and disaster risk |
| **Unit of observation** | Tabular records |
| **Rows (total)** | 76 |
| **Columns** | 10 (4 numeric, 6 categorical, 0 datetime) |
| **Train split** | 60 rows |
| **Test split** | 15 rows |
| **Geographic scope** | SOM |
| **Publisher** | OCHA Regional Office for Southern and Eastern Africa (ROSEA) |
| **HDX last updated** | 2025-09-16 |
---
## Variables
**Geographic** — `location` (Lower Shabelle, Gedo, Bari), `operational_priority` (range 1.0–3.0).
**Outcome / Measurement** — `overall_affected` (range 20881.0–1242175.0).
**Identifier / Metadata** — `unnamed_1` (SO23, SO26, SO16), `unnamed_2` (District, Jamaame, Laasqoray), `unnamed_3` (admin2Pcode, SO2804, SO1503), `unnamed_6` (range 18338.0–1290596.0), `unnamed_7` (range 2141.0–617072.0) and 2 others.
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-somalia-pin-targeted-reached-by-location-and-cluster")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `location` | object | 0.0% | Lower Shabelle, Gedo, Bari |
| `unnamed_1` | object | 0.0% | SO23, SO26, SO16 |
| `unnamed_2` | object | 0.0% | District, Jamaame, Laasqoray |
| `unnamed_3` | object | 0.0% | admin2Pcode, SO2804, SO1503 |
| `operational_priority` | float64 | 2.6% | 1.0 – 3.0 (mean 2.0676) |
| `overall_affected` | float64 | 2.6% | 20881.0 – 1242175.0 (mean 111490.5676) |
| `unnamed_6` | float64 | 2.6% | 18338.0 – 1290596.0 (mean 103211.2703) |
| `unnamed_7` | float64 | 2.6% | 2141.0 – 617072.0 (mean 85523.473) |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-11 |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `operational_priority` | 1.0 | 3.0 | 2.0676 | 2.0 |
| `overall_affected` | 20881.0 | 1242175.0 | 111490.5676 | 67995.0 |
| `unnamed_6` | 18338.0 | 1290596.0 | 103211.2703 | 62691.5 |
| `unnamed_7` | 2141.0 | 617072.0 | 85523.473 | 56208.5 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 4 column(s) were cast from string to numeric or datetime based on parse-success rate (>85% threshold). The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from OCHA Regional Office for Southern and Eastern Africa (ROSEA) and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/somalia-pin-targeted-reached-by-location-and-cluster) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_somalia_pin_targeted_reached_by_location_and_cluster,
title = {Somalia Drought Related - People Affected, Targeted & Reached by Location},
author = {OCHA Regional Office for Southern and Eastern Africa (ROSEA)},
year = {2025},
url = {https://data.humdata.org/dataset/somalia-pin-targeted-reached-by-location-and-cluster},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica
搜集汇总
数据集介绍

构建方式
在自然灾害与灾害风险管理领域,该数据集源于联合国人道主义事务协调厅(OCHA)区域办事处发布的原始数据,由Electric Sheep Africa团队进行系统化整理。原始数据通过HDX平台的CKAN接口获取,经过标准化清洗流程,包括统一缺失值标记、转换列名为蛇形命名法,并将符合阈值的数据类型自动转换为数值或日期格式。最终,数据集以80/20的比例划分为训练集与测试集,采用固定随机种子确保可复现性,并以Snappy压缩的Parquet格式存储,从而构建出适用于机器学习任务的表格型数据集。
特点
该数据集聚焦于索马里干旱灾害的影响评估,以表格形式呈现,共包含76条记录与10个特征列,涵盖地理、运营优先级及人口影响等多维度信息。其显著特点在于整合了数值型与分类型变量,例如地理位置(如Lower Shabelle、Gedo)、运营优先级(1.0至3.0范围)以及受影响人口数量(从20881至1242175不等),同时保留了原始数据中的匿名标识字段。数据集规模精炼(小于1千样本),专为小样本分析设计,且经过统一的数据清洗与格式标准化,确保了在机器学习应用中的直接可用性与一致性。
使用方法
在灾害响应与人口影响分析的研究中,该数据集可直接通过Hugging Face的datasets库加载,实现便捷的机器学习流程集成。用户仅需调用load_dataset函数并指定数据集名称,即可获取已划分的训练集与测试集,进而转换为Pandas DataFrame进行探索性分析或模型训练。数据集适用于表格分类及相关预测任务,例如基于地理与运营特征评估灾害影响程度,或预测受影响人口规模。研究者可依据原始HDX页面提供的元数据与方法论说明,结合数据集的数值摘要与模式信息,开展深入的实证分析与模型验证。
背景与挑战
背景概述
在自然灾害与人道主义响应领域,数据驱动的决策支持系统日益成为评估灾情与分配资源的核心工具。索马里干旱相关数据集由联合国人道主义事务协调厅(OCHA)南部和东部非洲区域办公室于2025年发布,并由Electric Sheep Africa机构重新整理为机器学习可用格式。该数据集聚焦于索马里境内受干旱影响的地区,通过表格记录形式,详细统计了各地理位置中受灾人口的总数、目标援助人数及实际覆盖人数等关键指标。其核心研究问题在于量化干旱灾害的社会影响,为人道主义组织提供精准的灾情评估与行动优先级依据,从而提升应急响应的效率与针对性,对灾害风险管理与人道援助策略的优化具有显著的实践价值。
当前挑战
该数据集旨在解决干旱灾害影响评估与资源分配优化这一领域问题,其面临的挑战包括数据稀疏性与异质性,例如受灾人口统计在不同区域可能存在报告标准不一致或数据缺失,影响模型的泛化能力。构建过程中的挑战则源于原始数据的采集与处理环节,例如部分字段命名模糊(如'unnamed_1'、'unnamed_2'等),需依赖自动化清洗流程统一缺失值标记并转换数据类型,但无法修正原始数据中可能存在的误报值或定义不一致问题,且数据集规模较小(总计76行),限制了复杂机器学习模型的训练与验证效果。
常用场景
经典使用场景
在自然灾害与风险管理的领域中,该数据集为索马里干旱灾害的量化分析提供了关键支撑。其经典使用场景聚焦于基于表格分类的机器学习任务,通过整合地理位置、受影响人口数量及行动优先级等多维变量,构建预测模型以评估不同区域灾害影响的严重程度。研究人员常利用该数据集训练分类器,旨在识别高风险区域或预测资源需求,从而辅助人道主义组织优化应急响应策略。
实际应用
在实际应用层面,该数据集直接服务于人道主义行动与政策制定。非政府组织和政府机构可依据其数据,精准定位索马里干旱受灾最严重的地区,动态调整粮食援助、医疗支持及水源供应等干预措施。此外,数据支持长期监测与评估,帮助决策者优化灾后恢复计划,提升资源利用效率,最终增强社区应对气候冲击的韧性。
衍生相关工作
围绕该数据集衍生的经典工作主要集中于人道主义数据分析与机器学习交叉领域。例如,研究团队利用其开发了区域风险预测模型,结合地理信息系统进行空间可视化分析;另有工作聚焦于数据融合技术,将该数据集与卫星遥感或社会经济指标结合,以提升灾害影响的综合评估精度。这些成果显著丰富了灾害响应智能化的方法论体系。
以上内容由遇见数据集搜集并总结生成



