electricsheepafrica/africa-3w-operational-presence-december-2017
收藏Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-3w-operational-presence-december-2017
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: other
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- other
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- who-is-doing-what-and-where-3w-4w-5w
- eth
pretty_name: "3W Operational Presence December 2017"
dataset_info:
splits:
- name: train
num_examples: 3877
- name: test
num_examples: 969
---
# 3W Operational Presence December 2017
**Publisher:** OCHA Ethiopia · **Source:** [HDX](https://data.humdata.org/dataset/3w-operational-presence-december-2017) · **License:** `other-pd-nr` · **Updated:** 2024-09-13
---
## Abstract
The Who Does What Where is a core humanitarian dataset for coordination. This data contains operational presence of humanitarian partners in Ethiopia at admin3 level by cluster.
Each row in this dataset represents subnational administrative unit observations. Data was last updated on HDX on 2024-09-13. Geographic scope: **ETH**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Humanitarian and development data |
| **Unit of observation** | Subnational administrative unit observations |
| **Rows (total)** | 4,847 |
| **Columns** | 12 (0 numeric, 12 categorical, 0 datetime) |
| **Train split** | 3,877 rows |
| **Test split** | 969 rows |
| **Geographic scope** | ETH |
| **Publisher** | OCHA Ethiopia |
| **HDX last updated** | 2024-09-13 |
---
## Variables
**Geographic** — `organization_type` (International NGO, UN Agency, Governement), `region` (Somali, Oromia, SNNP), `zone` (Borena, East Harerge, Bale), `woreda` (Moyale, Babile, Gursum), `woreda_code` (ET050201, ET041216, ET041210).
**Identifier / Metadata** — `esa_source`, `esa_processed`.
**Other** — `organization` (UNICEF, SCI, NDRMC), `sector` (WASH, Food, Agriculture), `activities` (Water, Hygiene, Sanitation), `project_status` (Completed, Ongoing, Planned), `implementing_partner_s` (RWB, DPPB, SCI).
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-3w-operational-presence-december-2017")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `organization` | object | 0.0% | UNICEF, SCI, NDRMC |
| `organization_type` | object | 0.0% | International NGO, UN Agency, Governement |
| `region` | object | 0.0% | Somali, Oromia, SNNP |
| `zone` | object | 0.1% | Borena, East Harerge, Bale |
| `woreda` | object | 1.2% | Moyale, Babile, Gursum |
| `woreda_code` | object | 2.2% | ET050201, ET041216, ET041210 |
| `sector` | object | 0.0% | WASH, Food, Agriculture |
| `activities` | object | 25.0% | Water, Hygiene, Sanitation |
| `project_status` | object | 1.7% | Completed, Ongoing, Planned |
| `implementing_partner_s` | object | 0.0% | RWB, DPPB, SCI |
| `esa_source` | object | 0.0% | |
| `esa_processed` | object | 0.0% | |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
_No numeric columns._
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 11,441 exact duplicate rows were removed. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from OCHA Ethiopia and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- The following columns have >20% missing values and should be treated with caution in modelling: `activities`.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/3w-operational-presence-december-2017) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_3w_operational_presence_december_2017,
title = {3W Operational Presence December 2017},
author = {OCHA Ethiopia},
year = {2024},
url = {https://data.humdata.org/dataset/3w-operational-presence-december-2017},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
---
annotations_creators:
- 无注释
language_creators:
- 公开获取
language:
- 英语
license:
- 其他
multilinguality:
- 单语言
size_categories:
- 1000<n<10000
source_datasets:
- 原创
task_categories:
- 其他
task_ids:
- 无
tags:
- 非洲
- 人道主义
- 人道主义数据交换平台(HDX)
- Electric Sheep Africa
- 谁在何处做何事(Who Does What Where, 3W/4W/5W)
- 埃塞俄比亚(ETH)
pretty_name: "2017年12月3W运营存在情况"
dataset_info:
splits:
- name: train
num_examples: 3877
- name: test
num_examples: 969
---
# 2017年12月3W运营存在情况
**发布方:** 联合国人道主义事务协调厅埃塞俄比亚办事处(OCHA Ethiopia) · **来源:** [人道主义数据交换平台(Humanitarian Data Exchange, HDX)](https://data.humdata.org/dataset/3w-operational-presence-december-2017) · **许可协议:** `other-pd-nr` · **最后更新:** 2024-09-13
---
## 摘要
"谁在何处做何事(Who Does What Where, 3W)"是用于人道主义协调的核心数据集。本数据集收录了埃塞俄比亚三级行政区域(admin3)内,按集群划分的人道主义合作伙伴运营存在情况。
数据集中的每一行均代表一条次国家级行政单元的观测记录。本数据集最后于2024-09-13在HDX平台更新。地理覆盖范围:**埃塞俄比亚(ETH)**。
*本数据集已由[Electric Sheep Africa](https://huggingface.co/electricsheepafrica)整理为适配机器学习的Parquet格式(Parquet)。*
---
## 数据集特征
| 类别 | 详情 |
|---|---|
| **所属领域** | 人道主义与发展数据 |
| **观测单元** | 次国家级行政单元观测记录 |
| **总数据行数** | 4847条 |
| **列数** | 12列(0个数值型、12个分类型、0个日期时间型) |
| **训练集划分** | 3877条数据 |
| **测试集划分** | 969条数据 |
| **地理覆盖范围** | 埃塞俄比亚(ETH) |
| **发布方** | 联合国人道主义事务协调厅埃塞俄比亚办事处(OCHA Ethiopia) |
| **HDX平台最后更新时间** | 2024-09-13 |
---
## 变量说明
### 地理类变量
`organization_type`(组织类型,可选值:国际非政府组织、联合国机构、政府)、`region`(地区,可选值:索马里州、奥罗米亚州、SNNP州)、`zone`(分区,可选值:博雷纳、东哈勒尔盖、巴莱)、`woreda`(埃塞俄比亚基层行政单位,可选值:莫亚莱、巴比莱、古瑟姆)、`woreda_code`(基层行政单位编码,可选值:ET050201、ET041216、ET041210)。
### 标识符/元数据类变量
`esa_source`(ESA来源)、`esa_processed`(ESA处理标记)。
### 其他变量
`organization`(合作机构,可选值:联合国儿童基金会UNICEF、国际救助委员会SCI、国家灾害风险管理委员会NDRMC)、`sector`(部门领域,可选值:水、环境卫生与个人卫生WASH、粮食、农业)、`activities`(开展活动,可选值:供水、个人卫生、环境卫生设施)、`project_status`(项目状态,可选值:已完成、进行中、计划中)、`implementing_partner_s`(执行合作伙伴,可选值:RWB、DPPB、SCI)。
---
## 快速上手
python
from datasets import load_dataset
# 加载目标数据集
ds = load_dataset("electricsheepafrica/africa-3w-operational-presence-december-2017")
# 将训练集与测试集转换为Pandas数据框
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
---
## 数据结构
| 列名 | 数据类型 | 缺失率 | 取值范围/示例值 |
|---|---|---|---|
| `organization` | 字符型 | 0.0% | 联合国儿童基金会UNICEF、国际救助委员会SCI、国家灾害风险管理委员会NDRMC |
| `organization_type` | 字符型 | 0.0% | 国际非政府组织、联合国机构、政府 |
| `region` | 字符型 | 0.0% | 索马里州、奥罗米亚州、SNNP州 |
| `zone` | 字符型 | 0.1% | 博雷纳、东哈勒尔盖、巴莱 |
| `woreda` | 字符型 | 1.2% | 莫亚莱、巴比莱、古瑟姆 |
| `woreda_code` | 字符型 | 2.2% | ET050201、ET041216、ET041210 |
| `sector` | 字符型 | 0.0% | 水、环境卫生与个人卫生WASH、粮食、农业 |
| `activities` | 字符型 | 25.0% | 供水、个人卫生、环境卫生设施 |
| `project_status` | 字符型 | 1.7% | 已完成、进行中、计划中 |
| `implementing_partner_s` | 字符型 | 0.0% | RWB、DPPB、SCI |
| `esa_source` | 字符型 | 0.0% | 无 |
| `esa_processed` | 字符型 | 0.0% | 无 |
---
## 数值型变量统计摘要
| 列名 | 最小值 | 最大值 | 均值 | 中位数 |
|---|---|---|---|---|
| 无数值型列 | - | - | - | - |
---
## 数据整理流程
原始数据通过CKAN应用程序编程接口(CKAN API)从HDX平台下载,并转换为Parquet格式。所有列名均转为小写并标准化为蛇形命名法(snake_case)。将常见的缺失值标记(`N/A`、`null`、`none`、`-`、`unknown`、`no data`、`#N/A`)统一替换为`NaN`。移除了11441条完全重复的数据行。采用固定随机种子(42)按照80/20的比例划分为训练集与测试集,并以Snappy压缩的Parquet格式存储。
---
## 数据集局限性
- 本数据集源自联合国人道主义事务协调厅埃塞俄比亚办事处,尚未由Electric Sheep Africa(ESA)进行独立验证。
- 自动化数据清洗无法修正原始数据收集中的错报值、定义不一致或抽样偏差问题。
- 以下列的缺失率超过20%,在建模过程中需谨慎使用:`activities`(开展活动)。
- 如需了解发布方的方法说明与免责声明,请参阅[原始HDX数据集页面](https://data.humdata.org/dataset/3w-operational-presence-december-2017)。
---
## 引用格式
bibtex
@dataset{hdx_africa_3w_operational_presence_december_2017,
title = {2017年12月3W运营存在情况},
author = {联合国人道主义事务协调厅埃塞俄比亚办事处},
year = {2024},
url = {https://data.humdata.org/dataset/3w-operational-presence-december-2017},
note = {由Electric Sheep Africa(https://huggingface.co/electricsheepafrica)重新打包为机器学习适配格式}
}
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — 非洲机器学习数据集基础设施。尼日利亚拉各斯。*
提供机构:
electricsheepafrica



