electricsheepafrica/africa-idmc-idp-data-nga
收藏Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-idmc-idp-data-nga
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- n<1K
source_datasets:
- original
task_categories:
- tabular-classification
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- conflict-violence
- displacement
- internally-displaced-persons-idp
- natural-disasters
- nga
pretty_name: "Nigeria - Internal Displacements (New Displacements) – IDPs"
dataset_info:
splits:
- name: train
num_examples: 12
- name: test
num_examples: 3
---
# Nigeria - Internal Displacements (New Displacements) – IDPs
**Publisher:** Internal Displacement Monitoring Centre (IDMC) · **Source:** [HDX](https://data.humdata.org/dataset/idmc-idp-data-nga) · **License:** `cc-by-igo` · **Updated:** 2026-03-18
---
## Abstract
The [Global Internal Displacement Database (GIDD)](http://www.internal-displacement.org/database/displacement-data), maintained by the [Internal Displacement Monitoring Centre (IDMC)](https://www.internal-displacement.org/), provides comprehensive, validated annual estimates of internal displacement worldwide. It defines internally displaced persons (IDPs) in line with the [1998 Guiding Principles](https://www.internal-displacement.org/internal-displacement/guiding-principles-on-internal-displacement/), as people or groups of people who have been forced or obliged to flee or to leave their homes or places of habitual residence, in particular as a result of armed conflict, or to avoid the effects of armed conflict, situations of generalized violence, violations of human rights, or natural or human-made disasters and who have not crossed an international border.
The GIDD tracks two primary metrics: "People Displaced" or population "Stock" figures, which represent the total number of people living in displacement at year-end, and "New Displacement," which counts new displacement incidents (population Flows) rather than individual people, accounting for potential multiple displacements by the same person. This dataset serves as a crucial resource for understanding long-term trends and validated displacement figures globally. For further detailed information and complete API specifications, users are encouraged to consult the official documentation at https://www.internal-displacement.org/database/api-documentation/.
"Internally displaced persons - IDPs" refers to the number of people living in displacement as of the end of each year.
"Internal displacements (New Displacements)" refers to the number of new cases or incidents of displacement recorded, rather than the number of people displaced. This is done because people may have been displaced more than once.
Each row in this dataset represents country-level aggregates. Data was last updated on HDX on 2026-03-18. Geographic scope: **NGA**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Conflict and security |
| **Unit of observation** | Country-level aggregates |
| **Rows (total)** | 16 |
| **Columns** | 9 (5 numeric, 4 categorical, 0 datetime) |
| **Train split** | 12 rows |
| **Test split** | 3 rows |
| **Geographic scope** | NGA |
| **Publisher** | Internal Displacement Monitoring Centre (IDMC) |
| **HDX last updated** | 2026-03-18 |
---
## Variables
**Geographic** — `iso3` (NGA), `country_name` (Nigeria), `year` (range 2009.0–2024.0), `new_displacement` (range 5000.0–975300.0), `new_displacement_rounded` (range 5000.0–975000.0) and 2 others.
**Identifier / Metadata** — `esa_source` (HDX), `esa_processed` (2026-04-07).
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-idmc-idp-data-nga")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `iso3` | object | 0.0% | NGA |
| `country_name` | object | 0.0% | Nigeria |
| `year` | int64 | 0.0% | 2009.0 – 2024.0 (mean 2016.5) |
| `new_displacement` | int64 | 0.0% | 5000.0 – 975300.0 (mean 323050.0) |
| `new_displacement_rounded` | int64 | 0.0% | 5000.0 – 975000.0 (mean 323000.0) |
| `total_displacement` | float64 | 25.0% | 1075300.0 – 3645757.0 (mean 2604714.25) |
| `total_displacement_rounded` | float64 | 25.0% | 1075000.0 – 3646000.0 (mean 2604750.0) |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-07 |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `year` | 2009.0 | 2024.0 | 2016.5 | 2016.5 |
| `new_displacement` | 5000.0 | 975300.0 | 323050.0 | 285112.5 |
| `new_displacement_rounded` | 5000.0 | 975000.0 | 323000.0 | 285000.0 |
| `total_displacement` | 1075300.0 | 3645757.0 | 2604714.25 | 2656520.0 |
| `total_displacement_rounded` | 1075000.0 | 3646000.0 | 2604750.0 | 2656500.0 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from Internal Displacement Monitoring Centre (IDMC) and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- The following columns have >20% missing values and should be treated with caution in modelling: `total_displacement`, `total_displacement_rounded`.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/idmc-idp-data-nga) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_idmc_idp_data_nga,
title = {Nigeria - Internal Displacements (New Displacements) – IDPs},
author = {Internal Displacement Monitoring Centre (IDMC)},
year = {2026},
url = {https://data.humdata.org/dataset/idmc-idp-data-nga},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
yaml
注释创建者:
- 无注释
语言创建者:
- 采集获取
语言:
- 英语
许可协议: cc-by-4.0
多语言属性:
- 单语言
数据规模分类:
- 样本数少于1000
源数据集:
- 原始数据集
任务类别:
- 表格分类
任务子类别: []
标签:
- 非洲
- 人道主义
- HDX
- Electric Sheep Africa
- 冲突与暴力
- 流离失所
- 国内流离失所者(IDPs)
- 自然灾害
- NGA
显示名称: "尼日利亚——国内流离失所情况(新增流离事件)——国内流离失所者(IDPs)"
数据集信息:
数据划分:
- 名称: 训练集
样本数: 12
- 名称: 测试集
样本数: 3
---
# 尼日利亚——国内流离失所情况(新增流离事件)——国内流离失所者(IDPs)
**发布方:** 国内流离失所者监测中心(Internal Displacement Monitoring Centre, IDMC) · **来源:** [HDX](https://data.humdata.org/dataset/idmc-idp-data-nga) · **许可协议:** `cc-by-igo` · **更新时间:** 2026-03-18
---
## 摘要
由国内流离失所者监测中心(Internal Displacement Monitoring Centre, IDMC)维护的[全球国内流离失所数据库(Global Internal Displacement Database, GIDD)](http://www.internal-displacement.org/database/displacement-data),提供了全球范围内经过验证的全面年度国内流离失所统计估算值。该数据库依据[1998年指导原则](https://www.internal-displacement.org/internal-displacement/guiding-principles-on-internal-displacement/)将国内流离失所者(IDPs)定义为:因武装冲突、规避武装冲突影响、普遍性暴力局势、侵犯人权行为、自然灾害或人为灾害等原因,被迫或不得不逃离或离开其家园或惯常居所,且未跨越国际边境的个人或群体。
GIDD追踪两项核心指标:一是“流离失所人口”或年末流离失所总人数的“存量”数据,二是“新增流离失所情况”,该指标统计的是新增流离失所事件(人口流动)而非个体流离失所者人数,以此涵盖同一主体多次流离失所的情况。本数据集是了解全球长期流离失所趋势与经验证的流离失所数据的重要资源。如需获取详细信息与完整API规范,建议用户查阅官方文档:https://www.internal-displacement.org/database/api-documentation/。
“国内流离失所者(IDPs)”指截至每年年末处于流离失所状态的人口数量。
“国内流离失所情况(新增流离事件)”指记录的新增流离失所案例或事件数量,而非流离失所者总人数。之所以采用该统计方式,是因为同一人群可能多次经历流离失所。
本数据集的每一行均代表国家级汇总数据。数据最后一次在HDX平台更新的时间为2026-03-18。地理覆盖范围:**NGA(尼日利亚ISO3国家代码)**。
*本数据集由[Electric Sheep Africa](https://huggingface.co/electricsheepafrica)整理为机器学习可用的Parquet格式。*
---
## 数据集特征
| | |
|---|---|
| **研究领域** | 冲突与安全 |
| **观测单元** | 国家级汇总数据 |
| **总样本行数** | 16 |
| **列数** | 9(5个数值列、4个分类列、0个日期时间列) |
| **训练集划分** | 12行 |
| **测试集划分** | 3行 |
| **地理覆盖范围** | NGA |
| **发布方** | 国内流离失所者监测中心(IDMC) |
| **HDX平台最后更新时间** | 2026-03-18 |
---
## 变量
**地理类变量** — `iso3`(NGA,ISO3国家代码)、`country_name`(尼日利亚)、`year`(取值范围2009.0–2024.0)、`new_displacement`(取值范围5000.0–975300.0)、`new_displacement_rounded`(取值范围5000.0–975000.0)及另外2个变量。
**标识符与元数据类变量** — `esa_source`(HDX)、`esa_processed`(2026-04-07)。
---
## 快速上手
python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-idmc-idp-data-nga")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
---
## 数据结构
| 列名 | 数据类型 | 空值占比 | 取值范围/示例值 |
|---|---|---|---|
| `iso3` | 字符串类型 | 0.0% | NGA |
| `country_name` | 字符串类型 | 0.0% | 尼日利亚 |
| `year` | 整数型 | 0.0% | 2009.0 – 2024.0(均值2016.5) |
| `new_displacement` | 整数型 | 0.0% | 5000.0 – 975300.0(均值323050.0) |
| `new_displacement_rounded` | 整数型 | 0.0% | 5000.0 – 975000.0(均值323000.0) |
| `total_displacement` | 浮点型 | 25.0% | 1075300.0 – 3645757.0(均值2604714.25) |
| `total_displacement_rounded` | 浮点型 | 25.0% | 1075000.0 – 3646000.0(均值2604750.0) |
| `esa_source` | 字符串类型 | 0.0% | HDX |
| `esa_processed` | 字符串类型 | 0.0% | 2026-04-07 |
---
## 数值统计摘要
| 列名 | 最小值 | 最大值 | 均值 | 中位数 |
|---|---|---|---|---|
| `year` | 2009.0 | 2024.0 | 2016.5 | 2016.5 |
| `new_displacement` | 5000.0 | 975300.0 | 323050.0 | 285112.5 |
| `new_displacement_rounded` | 5000.0 | 975000.0 | 323000.0 | 285000.0 |
| `total_displacement` | 1075300.0 | 3645757.0 | 2604714.25 | 2656520.0 |
| `total_displacement_rounded` | 1075000.0 | 3646000.0 | 2604750.0 | 2656500.0 |
---
## 数据整理流程
原始数据通过CKAN应用程序编程接口从HDX平台下载,并转换为Parquet格式。列名统一转换为小写并标准化为蛇形命名法。将常见的缺失值标记(`N/A`、`null`、`none`、`-`、`unknown`、`no data`、`#N/A`)统一替换为`NaN`。本数据集使用固定随机种子(42)按照80/20的比例划分为训练集与测试集,并保存为Snappy压缩的Parquet格式文件。
---
## 局限性
- 数据源自国内流离失所者监测中心(IDMC),未由Electric Sheep Africa进行独立验证。
- 自动化清洗流程无法修正原始数据收集中的错报值、定义不一致或抽样偏差问题。
- 以下两列的缺失值占比超过20%,在建模过程中需谨慎使用:`total_displacement`、`total_displacement_rounded`。
- 如需获取发布方的方法论说明与注意事项,请参阅[原始HDX数据集页面](https://data.humdata.org/dataset/idmc-idp-data-nga)。
---
## 引用格式
bibtex
@dataset{hdx_africa_idmc_idp_data_nga,
title = {Nigeria - Internal Displacements (New Displacements) – IDPs},
author = {Internal Displacement Monitoring Centre (IDMC)},
year = {2026},
url = {https://data.humdata.org/dataset/idmc-idp-data-nga},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — 非洲机器学习数据集基础设施。尼日利亚拉各斯。*
提供机构:
electricsheepafrica



