electricsheepafrica/africa-idmc-idp-data-tcd
收藏Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-idmc-idp-data-tcd
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- n<1K
source_datasets:
- original
task_categories:
- tabular-classification
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- conflict-violence
- displacement
- internally-displaced-persons-idp
- natural-disasters
- tcd
pretty_name: "Chad - Internal Displacements (New Displacements) – IDPs"
dataset_info:
splits:
- name: train
num_examples: 12
- name: test
num_examples: 3
---
# Chad - Internal Displacements (New Displacements) – IDPs
**Publisher:** Internal Displacement Monitoring Centre (IDMC) · **Source:** [HDX](https://data.humdata.org/dataset/idmc-idp-data-tcd) · **License:** `cc-by-igo` · **Updated:** 2026-03-18
---
## Abstract
The [Global Internal Displacement Database (GIDD)](http://www.internal-displacement.org/database/displacement-data), maintained by the [Internal Displacement Monitoring Centre (IDMC)](https://www.internal-displacement.org/), provides comprehensive, validated annual estimates of internal displacement worldwide. It defines internally displaced persons (IDPs) in line with the [1998 Guiding Principles](https://www.internal-displacement.org/internal-displacement/guiding-principles-on-internal-displacement/), as people or groups of people who have been forced or obliged to flee or to leave their homes or places of habitual residence, in particular as a result of armed conflict, or to avoid the effects of armed conflict, situations of generalized violence, violations of human rights, or natural or human-made disasters and who have not crossed an international border.
The GIDD tracks two primary metrics: "People Displaced" or population "Stock" figures, which represent the total number of people living in displacement at year-end, and "New Displacement," which counts new displacement incidents (population Flows) rather than individual people, accounting for potential multiple displacements by the same person. This dataset serves as a crucial resource for understanding long-term trends and validated displacement figures globally. For further detailed information and complete API specifications, users are encouraged to consult the official documentation at https://www.internal-displacement.org/database/api-documentation/.
"Internally displaced persons - IDPs" refers to the number of people living in displacement as of the end of each year.
"Internal displacements (New Displacements)" refers to the number of new cases or incidents of displacement recorded, rather than the number of people displaced. This is done because people may have been displaced more than once.
Each row in this dataset represents country-level aggregates. Data was last updated on HDX on 2026-03-18. Geographic scope: **TCD**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Conflict and security |
| **Unit of observation** | Country-level aggregates |
| **Rows (total)** | 16 |
| **Columns** | 9 (5 numeric, 4 categorical, 0 datetime) |
| **Train split** | 12 rows |
| **Test split** | 3 rows |
| **Geographic scope** | TCD |
| **Publisher** | Internal Displacement Monitoring Centre (IDMC) |
| **HDX last updated** | 2026-03-18 |
---
## Variables
**Geographic** — `iso3` (TCD), `country_name` (Chad), `year` (range 2009.0–2024.0), `new_displacement` (range 0.0–118390.0), `new_displacement_rounded` (range 5800.0–118000.0) and 2 others.
**Identifier / Metadata** — `esa_source` (HDX), `esa_processed` (2026-04-06).
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-idmc-idp-data-tcd")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `iso3` | object | 0.0% | TCD |
| `country_name` | object | 0.0% | Chad |
| `year` | int64 | 0.0% | 2009.0 – 2024.0 (mean 2016.5) |
| `new_displacement` | float64 | 6.2% | 0.0 – 118390.0 (mean 34661.7333) |
| `new_displacement_rounded` | float64 | 43.8% | 5800.0 – 118000.0 (mean 57533.3333) |
| `total_displacement` | int64 | 0.0% | 71000.0 – 451810.0 (mean 201780.6875) |
| `total_displacement_rounded` | int64 | 0.0% | 71000.0 – 452000.0 (mean 201812.5) |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-06 |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `year` | 2009.0 | 2024.0 | 2016.5 | 2016.5 |
| `new_displacement` | 0.0 | 118390.0 | 34661.7333 | 36157.0 |
| `new_displacement_rounded` | 5800.0 | 118000.0 | 57533.3333 | 58000.0 |
| `total_displacement` | 71000.0 | 451810.0 | 201780.6875 | 162867.0 |
| `total_displacement_rounded` | 71000.0 | 452000.0 | 201812.5 | 163000.0 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from Internal Displacement Monitoring Centre (IDMC) and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- The following columns have >20% missing values and should be treated with caution in modelling: `new_displacement_rounded`.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/idmc-idp-data-tcd) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_idmc_idp_data_tcd,
title = {Chad - Internal Displacements (New Displacements) – IDPs},
author = {Internal Displacement Monitoring Centre (IDMC)},
year = {2026},
url = {https://data.humdata.org/dataset/idmc-idp-data-tcd},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
### 数据集元数据
- 注释创建者:无注释
- 语言生成方式:公开获取(found)
- 语言:英语(en)
- 许可协议:CC BY 4.0
- 多语言属性:单语言(monolingual)
- 数据规模:样本量少于1000(n<1K)
- 源数据集:原创数据集(original)
- 任务类别:表格分类(tabular-classification)
- 任务子类别:无
- 标签:非洲、人道主义、HDX、Electric Sheep Africa、冲突暴力、流离失所、国内流离失所者(internally-displaced-persons-idp)、自然灾害、乍得(tcd)
- 美观名称:乍得——国内流离失所(新增流离失所事件)——国内流离失所者(IDPs)
- 数据集划分:
- 训练集:12条样本
- 测试集:3条样本
# 乍得——国内流离失所(新增流离失所事件)——国内流离失所者(IDPs)
**发布方**:国内流离失所监测中心(Internal Displacement Monitoring Centre, IDMC) · **来源**:[HDX](https://data.humdata.org/dataset/idmc-idp-data-tcd) · **许可协议**:`CC BY-IGO` · **更新时间**:2026-03-18
---
## 摘要
由[国内流离失所监测中心(Internal Displacement Monitoring Centre, IDMC)](https://www.internal-displacement.org/)维护的[全球国内流离失所数据库(Global Internal Displacement Database, GIDD)](http://www.internal-displacement.org/database/displacement-data),提供了全球范围内经过验证的年度国内流离失所人口综合估算数据。该数据库依据[1998年指导原则](https://www.internal-displacement.org/internal-displacement/guiding-principles-on-internal-displacement/)对**国内流离失所者(internally displaced persons, IDPs)**进行定义:即因武装冲突、规避武装冲突影响、大规模暴力事件、人权侵犯行为、自然灾害或人为灾难,被迫或不得不逃离或离开其家园或惯常居所,且未跨越国际边境的个人或群体。
GIDD 追踪两项核心指标:一是"流离失所人口"或称人口"存量"数据,即年末处于流离失所状态的总人口数;二是"新增流离失所事件",该指标统计的是新增流离失所事件(人口流动)而非个体流离失所者,涵盖同一人多次流离失所的情况。本数据集为理解全球范围内长期流离失所趋势与经验证的流离失所数据提供了关键支撑。如需获取详细信息与完整API规范,建议用户查阅官方文档:https://www.internal-displacement.org/database/api-documentation/。
"国内流离失所者(IDPs)"指截至每年年末处于流离失所状态的人口总数。
"国内流离失所(新增流离失所事件)"指记录在案的新增流离失所事件数量,而非流离失所者的个体数量——这是由于同一人员可能多次经历流离失所。
本数据集的每一行均代表国家级汇总数据。数据最后于2026-03-18在HDX平台更新。地理覆盖范围:**乍得(TCD)**。
*本数据集经[Electric Sheep Africa](https://huggingface.co/electricsheepafrica)整理为适用于机器学习的Parquet格式。*
---
## 数据集特征
| 指标 | 详情 |
|---|---|
| **研究领域** | 冲突与安全 |
| **观测单元** | 国家级汇总数据 |
| **总行数** | 16条 |
| **总列数** | 9列(5列数值型、4列分类型、0列日期型) |
| **训练集划分** | 12条数据 |
| **测试集划分** | 3条数据 |
| **地理覆盖范围** | 乍得(TCD) |
| **发布方** | 国内流离失所监测中心(IDMC) |
| **HDX平台最后更新时间** | 2026-03-18 |
---
## 变量说明
**地理类变量** — `iso3`(国家代码,乍得TCD)、`country_name`(国家名称,乍得)、`year`(年份,范围2009.0–2024.0)、`new_displacement`(新增流离失所事件数,范围0.0–118390.0)、`new_displacement_rounded`(经四舍五入的新增流离失所事件数,范围5800.0–118000.0)及另外2个变量。
**标识/元数据类变量** — `esa_source`(数据来源,HDX)、`esa_processed`(数据处理时间,2026-04-06)。
---
## 快速上手
python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-idmc-idp-data-tcd")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
---
## 数据Schema
| 列名 | 数据类型 | 缺失率 | 取值范围/示例值 |
|---|---|---|---|
| `iso3` | 字符串型(object) | 0.0% | 乍得国家代码TCD |
| `country_name` | 字符串型(object) | 0.0% | 国家名称:乍得 |
| `year` | 64位整型(int64) | 0.0% | 2009.0 – 2024.0(均值2016.5) |
| `new_displacement` | 64位浮点型(float64) | 6.2% | 0.0 – 118390.0(均值34661.7333) |
| `new_displacement_rounded` | 64位浮点型(float64) | 43.8% | 5800.0 – 118000.0(均值57533.3333) |
| `total_displacement` | 64位整型(int64) | 0.0% | 71000.0 – 451810.0(均值201780.6875) |
| `total_displacement_rounded` | 64位整型(int64) | 0.0% | 71000.0 – 452000.0(均值201812.5) |
| `esa_source` | 字符串型(object) | 0.0% | 数据来源HDX |
| `esa_processed` | 字符串型(object) | 0.0% | 数据处理时间2026-04-06 |
---
## 数值型变量统计摘要
| 列名 | 最小值 | 最大值 | 均值 | 中位数 |
|---|---|---|---|---|
| `year` | 2009.0 | 2024.0 | 2016.5 | 2016.5 |
| `new_displacement` | 0.0 | 118390.0 | 34661.7333 | 36157.0 |
| `new_displacement_rounded` | 5800.0 | 118000.0 | 57533.3333 | 58000.0 |
| `total_displacement` | 71000.0 | 451810.0 | 201780.6875 | 162867.0 |
| `total_displacement_rounded` | 71000.0 | 452000.0 | 201812.5 | 163000.0 |
---
## 数据整理流程
原始数据通过CKAN API从HDX平台下载,并转换为Parquet格式。列名统一转换为小写并标准化为蛇形命名法(snake_case)。常见缺失值标记(`N/A`、`null`、`none`、`-`、`unknown`、`no data`、`#N/A`)被统一替换为`NaN`。本数据集以固定随机种子(42)按80/20的比例划分为训练集与测试集,并以Snappy压缩格式的Parquet文件保存。
---
## 数据集局限性
- 本数据集源自国内流离失所监测中心(IDMC),未经过Electric Sheep Africa(ESA)的独立验证。
- 自动化数据清洗无法修正原始数据集中的错报值、定义不一致问题或抽样偏差。
- 以下列的缺失率超过20%,在建模过程中需谨慎使用:`new_displacement_rounded`(经四舍五入的新增流离失所事件数)。
- 如需查阅发布方提供的方法说明与免责声明,请参考[HDX平台原始数据集页面](https://data.humdata.org/dataset/idmc-idp-data-tcd)。
---
## 引用格式
bibtex
@dataset{hdx_africa_idmc_idp_data_tcd,
title = {Chad - Internal Displacements (New Displacements) – IDPs},
author = {Internal Displacement Monitoring Centre (IDMC)},
year = {2026},
url = {https://data.humdata.org/dataset/idmc-idp-data-tcd},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — 非洲机器学习数据集基础设施。尼日利亚拉各斯。*
提供机构:
electricsheepafrica



