electricsheepafrica/africa-idmc-idp-data-eth
收藏Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-idmc-idp-data-eth
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- n<1K
source_datasets:
- original
task_categories:
- tabular-classification
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- conflict-violence
- displacement
- internally-displaced-persons-idp
- natural-disasters
- eth
pretty_name: "Ethiopia - Internal Displacements (New Displacements) – IDPs"
dataset_info:
splits:
- name: train
num_examples: 12
- name: test
num_examples: 3
---
# Ethiopia - Internal Displacements (New Displacements) – IDPs
**Publisher:** Internal Displacement Monitoring Centre (IDMC) · **Source:** [HDX](https://data.humdata.org/dataset/idmc-idp-data-eth) · **License:** `cc-by-igo` · **Updated:** 2026-03-18
---
## Abstract
The [Global Internal Displacement Database (GIDD)](http://www.internal-displacement.org/database/displacement-data), maintained by the [Internal Displacement Monitoring Centre (IDMC)](https://www.internal-displacement.org/), provides comprehensive, validated annual estimates of internal displacement worldwide. It defines internally displaced persons (IDPs) in line with the [1998 Guiding Principles](https://www.internal-displacement.org/internal-displacement/guiding-principles-on-internal-displacement/), as people or groups of people who have been forced or obliged to flee or to leave their homes or places of habitual residence, in particular as a result of armed conflict, or to avoid the effects of armed conflict, situations of generalized violence, violations of human rights, or natural or human-made disasters and who have not crossed an international border.
The GIDD tracks two primary metrics: "People Displaced" or population "Stock" figures, which represent the total number of people living in displacement at year-end, and "New Displacement," which counts new displacement incidents (population Flows) rather than individual people, accounting for potential multiple displacements by the same person. This dataset serves as a crucial resource for understanding long-term trends and validated displacement figures globally. For further detailed information and complete API specifications, users are encouraged to consult the official documentation at https://www.internal-displacement.org/database/api-documentation/.
"Internally displaced persons - IDPs" refers to the number of people living in displacement as of the end of each year.
"Internal displacements (New Displacements)" refers to the number of new cases or incidents of displacement recorded, rather than the number of people displaced. This is done because people may have been displaced more than once.
Each row in this dataset represents country-level aggregates. Data was last updated on HDX on 2026-03-18. Geographic scope: **ETH**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Conflict and security |
| **Unit of observation** | Country-level aggregates |
| **Rows (total)** | 16 |
| **Columns** | 9 (5 numeric, 4 categorical, 0 datetime) |
| **Train split** | 12 rows |
| **Test split** | 3 rows |
| **Geographic scope** | ETH |
| **Publisher** | Internal Displacement Monitoring Centre (IDMC) |
| **HDX last updated** | 2026-03-18 |
---
## Variables
**Geographic** — `iso3` (ETH), `country_name` (Ethiopia), `year` (range 2009.0–2024.0), `new_displacement` (range 0.0–5142356.0), `new_displacement_rounded` (range 50000.0–5142000.0) and 2 others.
**Identifier / Metadata** — `esa_source` (HDX), `esa_processed` (2026-04-06).
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-idmc-idp-data-eth")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `iso3` | object | 0.0% | ETH |
| `country_name` | object | 0.0% | Ethiopia |
| `year` | int64 | 0.0% | 2009.0 – 2024.0 (mean 2016.5) |
| `new_displacement` | int64 | 0.0% | 0.0 – 5142356.0 (mean 977311.125) |
| `new_displacement_rounded` | float64 | 12.5% | 50000.0 – 5142000.0 (mean 1116928.5714) |
| `total_displacement` | int64 | 0.0% | 257563.0 – 3851840.0 (mean 1377630.0) |
| `total_displacement_rounded` | int64 | 0.0% | 258000.0 – 3852000.0 (mean 1377562.5) |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-06 |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `year` | 2009.0 | 2024.0 | 2016.5 | 2016.5 |
| `new_displacement` | 0.0 | 5142356.0 | 977311.125 | 341671.5 |
| `new_displacement_rounded` | 50000.0 | 5142000.0 | 1116928.5714 | 556000.0 |
| `total_displacement` | 257563.0 | 3851840.0 | 1377630.0 | 764316.0 |
| `total_displacement_rounded` | 258000.0 | 3852000.0 | 1377562.5 | 764000.0 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from Internal Displacement Monitoring Centre (IDMC) and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/idmc-idp-data-eth) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_idmc_idp_data_eth,
title = {Ethiopia - Internal Displacements (New Displacements) – IDPs},
author = {Internal Displacement Monitoring Centre (IDMC)},
year = {2026},
url = {https://data.humdata.org/dataset/idmc-idp-data-eth},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
annotations_creators:
- 无标注
language_creators:
- 公开获取
language:
- 英语
license: cc-by-4.0
multilinguality:
- 单语言
size_categories:
- 样本量小于1000
source_datasets:
- 原始数据集
task_categories:
- 表格分类
task_ids: []
tags:
- 非洲
- 人道主义
- 人道主义数据交换(Humanitarian Data Exchange, HDX)
- 非洲电羊(Electric Sheep Africa)
- 冲突与暴力
- 流离失所
- 国内流离失所者(Internally Displaced Persons, IDPs)
- 自然灾害
- 埃塞俄比亚(ETH)
pretty_name: "埃塞俄比亚——国内流离失所(新增流离失所)——国内流离失所者(IDPs)"
dataset_info:
splits:
- name: train
num_examples: 12
- name: test
num_examples: 3
# 埃塞俄比亚——国内流离失所(新增流离失所)——国内流离失所者(IDPs)
**发布方:** 国内流离失所者监测中心(Internal Displacement Monitoring Centre, IDMC) · **来源:** [HDX](https://data.humdata.org/dataset/idmc-idp-data-eth) · **授权协议:** `cc-by-igo` · **更新时间:** 2026-03-18
---
## 摘要
由国内流离失所者监测中心(Internal Displacement Monitoring Centre, IDMC)维护的[全球国内流离失所数据库(Global Internal Displacement Database, GIDD)](http://www.internal-displacement.org/database/displacement-data),提供了全球范围内经过全面验证的年度国内流离失所估算数据。该机构依据[1998年指导原则(1998 Guiding Principles)](https://www.internal-displacement.org/internal-displacement/guiding-principles-on-internal-displacement/)将国内流离失所者(Internally Displaced Persons, IDPs)定义为:因武装冲突、规避武装冲突影响、大规模暴力事件、人权侵犯行为、自然灾害或人为灾害,被迫或不得不逃离家园或惯常居住地,且未跨越国际边境的个人或群体。
GIDD追踪两项核心指标:一是“流离失所人口”或称年末流离失所总人口的“存量”数据,二是“新增流离失所”,即统计新增流离失所事件(人口流动量)而非个体流离失所人数,这一统计方式涵盖了同一人多次流离失所的情况。本数据集是理解全球长期流离失所趋势与验证流离失所数据的关键资源。如需获取详细信息与完整API规范,建议用户查阅官方文档:https://www.internal-displacement.org/database/api-documentation/。
"国内流离失所者(IDPs)"指截至每年年末处于流离失所状态的人口数量。
"国内流离失所(新增流离失所)"指记录的新增流离失所案例或事件数量,而非流离失所的个体人数,这是因为个人可能多次经历流离失所。
本数据集的每一行均代表国家级汇总数据。数据最近一次在HDX平台更新的时间为2026-03-18。地理覆盖范围:**ETH(埃塞俄比亚)**。
*由[非洲电羊(Electric Sheep Africa)](https://huggingface.co/electricsheepafrica)整理为机器学习可用的Parquet格式。*
---
## 数据集特征
| | |
|---|---|
| **领域** | 冲突与安全 |
| **观测单元** | 国家级汇总数据 |
| **总行数** | 16 |
| **列数** | 9(5个数值型、4个分类型、0个日期时间型) |
| **训练集拆分** | 12行 |
| **测试集拆分** | 3行 |
| **地理覆盖范围** | ETH(埃塞俄比亚) |
| **发布方** | 国内流离失所者监测中心(IDMC) |
| **HDX平台最后更新时间** | 2026-03-18 |
---
## 变量
**地理类变量** —— `iso3`(ETH,埃塞俄比亚ISO3代码)、`country_name`(埃塞俄比亚)、`year`(取值范围2009.0–2024.0)、`new_displacement`(取值范围0.0–5142356.0)、`new_displacement_rounded`(取值范围50000.0–5142000.0)及另外2个变量。
**标识符/元数据变量** —— `esa_source`(HDX)、`esa_processed`(2026-04-06)。
---
## 快速开始
python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-idmc-idp-data-eth")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
---
## 数据结构(Schema)
| 列名 | 数据类型 | 空值占比 | 取值范围/示例值 |
|---|---|---|---|
| `iso3` | 字符串(object) | 0.0% | ETH |
| `country_name` | 字符串(object) | 0.0% | 埃塞俄比亚 |
| `year` | 64位整数(int64) | 0.0% | 2009.0 – 2024.0(均值2016.5) |
| `new_displacement` | 64位整数(int64) | 0.0% | 0.0 – 5142356.0(均值977311.125) |
| `new_displacement_rounded` | 64位浮点数(float64) | 12.5% | 50000.0 – 5142000.0(均值1116928.5714) |
| `total_displacement` | 64位整数(int64) | 0.0% | 257563.0 – 3851840.0(均值1377630.0) |
| `total_displacement_rounded` | 64位整数(int64) | 0.0% | 258000.0 – 3852000.0(均值1377562.5) |
| `esa_source` | 字符串(object) | 0.0% | HDX |
| `esa_processed` | 字符串(object) | 0.0% | 2026-04-06 |
---
## 数值汇总统计
| 列名 | 最小值 | 最大值 | 均值 | 中位数 |
|---|---|---|---|---|
| `year` | 2009.0 | 2024.0 | 2016.5 | 2016.5 |
| `new_displacement` | 0.0 | 5142356.0 | 977311.125 | 341671.5 |
| `new_displacement_rounded` | 50000.0 | 5142000.0 | 1116928.5714 | 556000.0 |
| `total_displacement` | 257563.0 | 3851840.0 | 1377630.0 | 764316.0 |
| `total_displacement_rounded` | 258000.0 | 3852000.0 | 1377562.5 | 764000.0 |
---
## 数据整理流程
原始数据通过CKAN API从HDX平台下载,并转换为Parquet格式。列名统一转换为小写并标准化为蛇形命名法(snake_case)。常见的缺失值标记(`N/A`、`null`、`none`、`-`、`unknown`、`no data`、`#N/A`)被统一替换为`NaN`。本数据集以固定随机种子(42)按照80/20的比例划分为训练集与测试集,并以Snappy压缩格式保存为Parquet文件。
---
## 局限性说明
- 数据源自国内流离失所者监测中心(IDMC),并未经过非洲电羊(Electric Sheep Africa)的独立验证。
- 自动化清洗无法修正原始数据收集中的错报值、定义不一致或抽样偏差问题。
- 如需了解发布方的方法论说明与注意事项,请查阅[原始HDX数据集页面](https://data.humdata.org/dataset/idmc-idp-data-eth)。
---
## 引用格式
bibtex
@dataset{hdx_africa_idmc_idp_data_eth,
title = {Ethiopia - Internal Displacements (New Displacements) – IDPs},
author = {Internal Displacement Monitoring Centre (IDMC)},
year = {2026},
url = {https://data.humdata.org/dataset/idmc-idp-data-eth},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
---
*[非洲电羊(Electric Sheep Africa)](https://huggingface.co/electricsheepafrica) —— 非洲的机器学习数据集基础设施。尼日利亚拉各斯。*
提供机构:
electricsheepafrica



