electricsheepafrica/africa-3w-operational-presence-january-to-june-2019
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-3w-operational-presence-january-to-june-2019
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- tabular-classification
- tabular-regression
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- operational-presence
- who-is-doing-what-and-where-3w-4w-5w
- eth
pretty_name: "3W Operational Presence January to June 2019"
dataset_info:
splits:
- name: train
num_examples: 2440
- name: test
num_examples: 610
---
# 3W Operational Presence January to June 2019
**Publisher:** OCHA Ethiopia · **Source:** [HDX](https://data.humdata.org/dataset/3w-operational-presence-january-to-june-2019) · **License:** `cc-by` · **Updated:** 2025-04-25
---
## Abstract
The file contains operational presence of implementing partners by woreda and sector from January to June 2019.
Each row in this dataset represents subnational administrative unit observations. Data was last updated on HDX on 2025-04-25. Geographic scope: **ETH**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Humanitarian and development data |
| **Unit of observation** | Subnational administrative unit observations |
| **Rows (total)** | 3,050 |
| **Columns** | 13 (0 numeric, 13 categorical, 0 datetime) |
| **Train split** | 2,440 rows |
| **Test split** | 610 rows |
| **Geographic scope** | ETH |
| **Publisher** | OCHA Ethiopia |
| **HDX last updated** | 2025-04-25 |
---
## Variables
**Geographic** — `organization_type` (Government, International NGO, UN Agency), `region` (Oromia, Somali, SNNP), `zone` (West Guji, East Hararge, Gedeo), `woreda` (Kochere Gedeb, Kercha, Gelana (West Guji)), `woreda_code` (ET070506, ET041406, ET041217) and 1 others.
**Identifier / Metadata** — `esa_source`, `esa_processed`.
**Other** — `organization` (NDRMC, UNICEF, Government), `sector` (WASH, Food, Nutrition), `activities` (Distribution of food to targeted beneficiaries, Distribution of NFIs, 5. Mobile health and nutrition team (MHNT)), `project_status` (On-going, Completed, Planned), `implementing_partner_s`.
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-3w-operational-presence-january-to-june-2019")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `organization` | object | 0.0% | NDRMC, UNICEF, Government |
| `organization_type` | object | 0.0% | Government, International NGO, UN Agency |
| `region` | object | 0.0% | Oromia, Somali, SNNP |
| `zone` | object | 0.7% | West Guji, East Hararge, Gedeo |
| `woreda` | object | 1.7% | Kochere Gedeb, Kercha, Gelana (West Guji) |
| `woreda_code` | object | 2.1% | ET070506, ET041406, ET041217 |
| `zonecode` | object | 2.1% | ET0412, ET0410, ET0705 |
| `sector` | object | 0.0% | WASH, Food, Nutrition |
| `activities` | object | 22.9% | Distribution of food to targeted beneficiaries, Distribution of NFIs, 5. Mobile health and nutrition team (MHNT) |
| `project_status` | object | 0.0% | On-going, Completed, Planned |
| `implementing_partner_s` | object | 0.0% | |
| `esa_source` | object | 0.0% | |
| `esa_processed` | object | 0.0% | |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
_No numeric columns._
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from OCHA Ethiopia and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- The following columns have >20% missing values and should be treated with caution in modelling: `activities`.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/3w-operational-presence-january-to-june-2019) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_3w_operational_presence_january_to_june_2019,
title = {3W Operational Presence January to June 2019},
author = {OCHA Ethiopia},
year = {2025},
url = {https://data.humdata.org/dataset/3w-operational-presence-january-to-june-2019},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
annotations_creators:
- 无注释
language_creators:
- found(抓取自现有公开资源)
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual(单语言)
size_categories:
- 1K<n<10K(1000至10000条数据)
source_datasets:
- original(原创数据集)
task_categories:
- tabular-classification(表格分类)
- tabular-regression(表格回归)
task_ids: []
tags:
- africa(非洲)
- humanitarian(人道主义)
- hdx(人道主义数据交换平台)
- electric-sheep-africa(Electric Sheep Africa)
- operational-presence(运营存在情况)
- who-is-doing-what-and-where-3w-4w-5w(3W/4W/5W,谁、做什么、在哪里)
- eth(埃塞俄比亚)
pretty_name: "2019年1月至6月3W(Who-is-doing-what-and-where)运营存在情况"
dataset_info:
splits:
- name: train
num_examples: 2440
- name: test
num_examples: 610
---
# 2019年1月至6月3W(Who-is-doing-what-and-where)运营存在情况
**发布方:** 联合国人道主义事务协调厅埃塞俄比亚办事处(OCHA Ethiopia) · **来源:** [人道主义数据交换平台(Humanitarian Data Exchange,HDX)](https://data.humdata.org/dataset/3w-operational-presence-january-to-june-2019) · **许可证:** CC BY · **更新时间:** 2025年4月25日
---
## 摘要
本数据集包含2019年1月至6月期间,各执行伙伴按沃雷达(woreda,埃塞俄比亚基层行政单位)和行业划分的运营存在情况。本数据集的每一行均代表次国家级行政单位的观测记录。本数据集最近一次在HDX平台更新的时间为2025年4月25日,地理覆盖范围:**ETH(埃塞俄比亚)**。
*本数据集经[Electric Sheep Africa(非洲电羊团队)](https://huggingface.co/electricsheepafrica)整理为适配机器学习的Parquet格式。*
---
## 数据集特征
| 特征项 | 详情 |
|---|---|
| **领域** | 人道主义与发展数据 |
| **观测单元** | 次国家级行政单位观测记录 |
| **总数据行数** | 3050条 |
| **总列数** | 13列(0个数值型、13个分类型、0个日期时间型) |
| **训练集拆分** | 2440条数据 |
| **测试集拆分** | 610条数据 |
| **地理覆盖范围** | ETH(埃塞俄比亚) |
| **发布方** | 联合国人道主义事务协调厅埃塞俄比亚办事处(OCHA Ethiopia) |
| **HDX平台最后更新时间** | 2025年4月25日 |
---
## 变量说明
### 地理维度变量
`organization_type`(机构类型:政府、国际非政府组织、联合国机构)、`region`(地区:奥罗米亚、索马里、SNNP)、`zone`(专区:西古吉、东哈勒尔盖、盖多)、`woreda`(沃雷达:科切尔盖德布、凯尔查、格拉纳(西古吉专区))、`woreda_code`(沃雷达代码:ET070506、ET041406、ET041217)及其他1个变量。
### 标识与元数据变量
`esa_source`、`esa_processed`。
### 其他变量
`organization`(执行机构:NDRMC、联合国儿童基金会(UNICEF)、政府)、`sector`(行业:水、环境卫生与个人卫生(WASH)、粮食、营养)、`activities`(活动:向目标受益群体分发粮食、发放非食品物资、5. 移动健康与营养团队(MHNT))、`project_status`(项目状态:进行中、已完成、计划中)、`implementing_partner_s`(执行伙伴)。
---
## 快速使用指南
python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-3w-operational-presence-january-to-june-2019")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
---
## 数据Schema
| 列名 | 数据类型 | 空值占比 | 取值范围/示例值 |
|---|---|---|---|
| `organization` | 字符型(object) | 0.0% | NDRMC、联合国儿童基金会(UNICEF)、政府 |
| `organization_type` | 字符型(object) | 0.0% | 政府、国际非政府组织、联合国机构 |
| `region` | 字符型(object) | 0.0% | 奥罗米亚州、索马里州、SNNP州 |
| `zone` | 字符型(object) | 0.7% | 西古吉专区、东哈勒尔盖专区、盖多专区 |
| `woreda` | 字符型(object) | 1.7% | 科切尔盖德布、凯尔查、格拉纳(西古吉专区) |
| `woreda_code` | 字符型(object) | 2.1% | ET070506、ET041406、ET041217 |
| `zonecode` | 字符型(object) | 2.1% | ET0412、ET0410、ET0705 |
| `sector` | 字符型(object) | 0.0% | 水、环境卫生与个人卫生(WASH)、粮食、营养 |
| `activities` | 字符型(object) | 22.9% | 向目标受益群体分发粮食、发放非食品物资、5. 移动健康与营养团队(MHNT) |
| `project_status` | 字符型(object) | 0.0% | 进行中、已完成、计划中 |
| `implementing_partner_s` | 字符型(object) | 0.0% | 无 |
| `esa_source` | 字符型(object) | 0.0% | 无 |
| `esa_processed` | 字符型(object) | 0.0% | 无 |
---
## 数值型变量统计
| 列名 | 最小值 | 最大值 | 均值 | 中位数 |
|---|---|---|---|---|
| 无数值型列 |
---
## 数据整理流程
原始数据通过CKAN API从HDX平台下载,并转换为Parquet格式。所有列名均转换为小写,并统一规范为蛇形命名法。常见的空值标记(`N/A`、`null`、`none`、`-`、`unknown`、`no data`、`#N/A`)被统一替换为`NaN`。本数据集以固定随机种子(42)按80/20比例划分为训练集与测试集,并保存为Snappy压缩格式的Parquet文件。
---
## 数据集局限性
1. 本数据集源自埃塞俄比亚人道主义事务协调厅,未经Electric Sheep Africa(ESA)独立验证。
2. 自动化数据清洗无法修正原始数据收集中的错报值、定义不一致或抽样偏差问题。
3. 以下列的空值占比超过20%,在建模时需谨慎使用:`activities`(活动记录)。
4. 如需了解发布方的方法说明与免责声明,请参阅[原始HDX数据集页面](https://data.humdata.org/dataset/3w-operational-presence-january-to-june-2019)。
---
## 引用格式
bibtex
@dataset{hdx_africa_3w_operational_presence_january_to_june_2019,
title = {2019年1月至6月3W运营存在情况},
author = {联合国人道主义事务协调厅埃塞俄比亚办事处(OCHA Ethiopia)},
year = {2025},
url = {https://data.humdata.org/dataset/3w-operational-presence-january-to-june-2019},
note = {经Electric Sheep Africa(非洲电羊团队)重新打包以适配机器学习任务}
}
---
*[Electric Sheep Africa(非洲电羊团队)](https://huggingface.co/electricsheepafrica) — 非洲机器学习数据集基础设施提供商,总部位于尼日利亚拉各斯。*
提供机构:
electricsheepafrica



