electricsheepafrica/africa-demographics-chad
收藏Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-demographics-chad
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: other
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- tabular-classification
- other
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- demographics
- health
- tcd
pretty_name: "Chad - Subnational Demographic and Health Data"
dataset_info:
splits:
- name: train
num_examples: 872
- name: test
num_examples: 218
---
# Chad - Subnational Demographic and Health Data
**Publisher:** The DHS Program · **Source:** [HDX](https://data.humdata.org/dataset/dhs-subnational-data-for-chad) · **License:** `hdx-other` · **Updated:** 2026-04-20
---
## Abstract
Contains data from the [DHS data portal](https://api.dhsprogram.com/). There is also a dataset containing [Chad - National Demographic and Health Data](https://data.humdata.org/dataset/dhs-data-for-chad) on HDX.
The DHS Program Application Programming Interface (API) provides software developers access to aggregated indicator data from The Demographic and Health Surveys (DHS) Program. The API can be used to create various applications to help analyze, visualize, explore and disseminate data on population, health, HIV, and nutrition from more than 90 countries.
Each row in this dataset represents first-level administrative unit observations. Data was last updated on HDX on 2026-04-20. Geographic scope: **TCD**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Public health |
| **Unit of observation** | First-level administrative unit observations |
| **Rows (total)** | 1,091 |
| **Columns** | 30 (13 numeric, 17 categorical, 0 datetime) |
| **Train split** | 872 rows |
| **Test split** | 218 rows |
| **Geographic scope** | TCD |
| **Publisher** | The DHS Program |
| **HDX last updated** | 2026-04-20 |
---
## Variables
**Geographic** — `iso3` (TCD), `location` (Zone 2, Zone 3, Zone 4), `dhs_countrycode` (TD), `countryname` (Chad), `surveyyear` (range 1997.0–2014.0) and 8 others.
**Outcome / Measurement** — `value` (range 0.0–256.0), `istotal` (range 0.0–0.0).
**Identifier / Metadata** — `dataid` (range 10552.0–7979564.0), `indicatorid` (RH_DELP_C_DHF, CH_DIAT_C_ORT, CM_ECMR_C_U5M), `characteristicid` (range 445001.0–445038.0), `characteristiclabel` (Zone 2, Zone 3, Zone 4), `ispreferred` (range 0.0–1.0) and 3 others.
**Other** — `indicator` (Place of delivery: Health facility, Treatment of diarrhea: Either ORS or RHF, Under-five mortality rate), `precision` (range 0.0–1.0), `indicatororder` (range 11763080.0–260321010.0), `characteristicorder` (range 1445001.0–1445082.0), `denominatorweighted` (range 4.0–5856.0) and 2 others.
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-demographics-chad")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `iso3` | object | 0.0% | TCD |
| `location` | object | 0.0% | Zone 2, Zone 3, Zone 4 |
| `dataid` | int64 | 0.0% | 10552.0 – 7979564.0 (mean 4215791.2337) |
| `indicator` | object | 0.0% | Place of delivery: Health facility, Treatment of diarrhea: Either ORS or RHF, Under-five mortality rate |
| `value` | float64 | 0.0% | 0.0 – 256.0 (mean 23.6657) |
| `precision` | int64 | 0.0% | 0.0 – 1.0 (mean 0.9267) |
| `dhs_countrycode` | object | 0.0% | TD |
| `countryname` | object | 0.0% | Chad |
| `surveyyear` | int64 | 0.0% | 1997.0 – 2014.0 (mean 2011.0779) |
| `surveyid` | object | 0.0% | TD2014DHS, TD2004DHS, TD1997DHS |
| `indicatorid` | object | 0.0% | RH_DELP_C_DHF, CH_DIAT_C_ORT, CM_ECMR_C_U5M |
| `indicatororder` | int64 | 0.0% | 11763080.0 – 260321010.0 (mean 104889642.1173) |
| `indicatortype` | object | 0.0% | I |
| `characteristicid` | int64 | 0.0% | 445001.0 – 445038.0 (mean 445021.0027) |
| `characteristicorder` | int64 | 0.0% | 1445001.0 – 1445082.0 (mean 1445042.6948) |
| `characteristiccategory` | object | 0.0% | Region |
| `characteristiclabel` | object | 0.0% | Zone 2, Zone 3, Zone 4 |
| `byvariableid` | int64 | 0.0% | 0.0 – 631001.0 (mean 19787.7434) |
| `byvariablelabel` | object | 71.8% | |
| `istotal` | int64 | 0.0% | 0.0 – 0.0 (mean 0.0) |
| `ispreferred` | int64 | 0.0% | 0.0 – 1.0 (mean 0.8643) |
| `sdrid` | object | 0.0% | |
| `regionid` | object | 0.0% | |
| `surveyyearlabel` | object | 0.0% | |
| `surveytype` | object | 0.0% | |
| `denominatorweighted` | float64 | 22.0% | 4.0 – 5856.0 (mean 694.4618) |
| `denominatorunweighted` | float64 | 22.0% | 32.0 – 4535.0 (mean 702.7861) |
| `levelrank` | float64 | 45.8% | 1.0 – 1.0 (mean 1.0) |
| `esa_source` | object | 0.0% | |
| `esa_processed` | object | 0.0% | |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `dataid` | 10552.0 | 7979564.0 | 4215791.2337 | 4286803.0 |
| `value` | 0.0 | 256.0 | 23.6657 | 14.9 |
| `precision` | 0.0 | 1.0 | 0.9267 | 1.0 |
| `surveyyear` | 1997.0 | 2014.0 | 2011.0779 | 2014.0 |
| `indicatororder` | 11763080.0 | 260321010.0 | 104889642.1173 | 94096040.0 |
| `characteristicid` | 445001.0 | 445038.0 | 445021.0027 | 445020.0 |
| `characteristicorder` | 1445001.0 | 1445082.0 | 1445042.6948 | 1445041.0 |
| `byvariableid` | 0.0 | 631001.0 | 19787.7434 | 0.0 |
| `istotal` | 0.0 | 0.0 | 0.0 | 0.0 |
| `ispreferred` | 0.0 | 1.0 | 0.8643 | 1.0 |
| `denominatorweighted` | 4.0 | 5856.0 | 694.4618 | 516.0 |
| `denominatorunweighted` | 32.0 | 4535.0 | 702.7861 | 538.0 |
| `levelrank` | 1.0 | 1.0 | 1.0 | 1.0 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 2 column(s) with >80% missing values were removed: `cilow`, `cihigh`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from The DHS Program and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- The following columns have >20% missing values and should be treated with caution in modelling: `byvariablelabel`, `denominatorweighted`, `denominatorunweighted`, `levelrank`.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/dhs-subnational-data-for-chad) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_demographics_chad,
title = {Chad - Subnational Demographic and Health Data},
author = {The DHS Program},
year = {2026},
url = {https://data.humdata.org/dataset/dhs-subnational-data-for-chad},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
annotations_creators:
- 无注释
language_creators:
- 采集获取
language:
- 英语
license:
- 其他
multilinguality:
- 单语言
size_categories:
- 1000 < 样本量 < 10000
source_datasets:
- 原创数据集
task_categories:
- 表格分类
- 其他
task_ids:
- 无
tags:
- 非洲
- 人道主义
- HDX
- Electric Sheep Africa
- 人口统计
- 健康
- TCD
pretty_name: "乍得——次国家级人口与健康数据"
dataset_info:
splits:
- name: train
num_examples: 872
- name: test
num_examples: 218
# 乍得——次国家级人口与健康数据
**发布方**:人口与健康调查项目(Demographic and Health Surveys Program,DHS) · **来源**:[人道主义数据交换(Humanitarian Data Exchange,HDX)](https://data.humdata.org/dataset/dhs-subnational-data-for-chad) · **许可证**:`hdx-other` · **更新时间**:2026-04-20
---
## 摘要
本数据集数据来源于[DHS数据门户(DHS Data Portal)](https://api.dhsprogram.com/)。人道主义数据交换(HDX)平台上另有一份包含[乍得——国家级人口与健康数据](https://data.humdata.org/dataset/dhs-data-for-chad)的数据集。
人口与健康调查项目(DHS Program)的应用程序编程接口(API)可为软件开发人员提供来自该项目的聚合指标数据,支持开发者构建各类应用,以分析、可视化、探索并发布来自全球90余个国家的人口、健康、艾滋病病毒(HIV)与营养相关数据。
本数据集的每一行代表一级行政单元的观测数据。数据最后一次在HDX平台更新的时间为2026-04-20。地理覆盖范围:**TCD(乍得ISO 3166-1阿尔法-3代码)**。
*本数据集已由[Electric Sheep Africa](https://huggingface.co/electricsheepafrica)整理为适配机器学习的Parquet格式。*
---
## 数据集特征
| 项 | 详情 |
|---|---|
| **领域** | 公共卫生 |
| **观测单元** | 一级行政单元 |
| **总行数** | 1091 |
| **列数** | 30列(13个数值型、17个分类型、0个日期时间型) |
| **训练集拆分** | 872行 |
| **测试集拆分** | 218行 |
| **地理覆盖范围** | TCD |
| **发布方** | 人口与健康调查项目(DHS Program) |
| **HDX平台最后更新时间** | 2026-04-20 |
---
## 变量
**地理类变量**:`iso3`(取值为TCD)、`location`(取值为Zone 2、Zone 3、Zone 4)、`dhs_countrycode`(取值为TD)、`countryname`(取值为乍得)、`surveyyear`(取值范围1997.0至2014.0),另有8个其他地理类变量。
**结果/测量类变量**:`value`(取值范围0.0至256.0)、`istotal`(取值范围0.0至0.0)。
**标识符/元数据类变量**:`dataid`(取值范围10552.0至7979564.0)、`indicatorid`(取值包括RH_DELP_C_DHF、CH_DIAT_C_ORT、CM_ECMR_C_U5M)、`characteristicid`(取值范围445001.0至445038.0)、`characteristiclabel`(取值为Zone 2、Zone 3、Zone 4)、`ispreferred`(取值范围0.0至1.0),另有3个其他标识符/元数据类变量。
**其他类变量**:`indicator`(指标包括:分娩地点:卫生机构、腹泻治疗:口服补液盐(ORS)或RHF、五岁以下儿童死亡率)、`precision`(取值范围0.0至1.0)、`indicatororder`(取值范围11763080.0至260321010.0)、`characteristicorder`(取值范围1445001.0至1445082.0)、`denominatorweighted`(取值范围4.0至5856.0),另有2个其他类变量。
---
## 快速上手
python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-demographics-chad")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
---
## 数据模式
| 列名 | 数据类型 | 空值占比 | 取值范围/示例值 |
|---|---|---|---|
| `iso3` | 字符型 | 0.0% | TCD |
| `location` | 字符型 | 0.0% | Zone 2, Zone 3, Zone 4 |
| `dataid` | 64位整型 | 0.0% | 10552.0 – 7979564.0(均值4215791.2337) |
| `indicator` | 字符型 | 0.0% | Place of delivery: Health facility, Treatment of diarrhea: Either ORS or RHF, Under-five mortality rate |
| `value` | 64位浮点型 | 0.0% | 0.0 – 256.0(均值23.6657) |
| `precision` | 64位整型 | 0.0% | 0.0 – 1.0(均值0.9267) |
| `dhs_countrycode` | 字符型 | 0.0% | TD |
| `countryname` | 字符型 | 0.0% | 乍得 |
| `surveyyear` | 64位整型 | 0.0% | 1997.0 – 2014.0(均值2011.0779) |
| `surveyid` | 字符型 | 0.0% | TD2014DHS, TD2004DHS, TD1997DHS |
| `indicatorid` | 字符型 | 0.0% | RH_DELP_C_DHF, CH_DIAT_C_ORT, CM_ECMR_C_U5M |
| `indicatororder` | 64位整型 | 0.0% | 11763080.0 – 260321010.0(均值104889642.1173) |
| `indicatortype` | 字符型 | 0.0% | I |
| `characteristicid` | 64位整型 | 0.0% | 445001.0 – 445038.0(均值445021.0027) |
| `characteristicorder` | 64位整型 | 0.0% | 1445001.0 – 1445082.0(均值1445042.6948) |
| `characteristiccategory` | 字符型 | 0.0% | Region |
| `characteristiclabel` | 字符型 | 0.0% | Zone 2, Zone 3, Zone 4 |
| `byvariableid` | 64位整型 | 0.0% | 0.0 – 631001.0(均值19787.7434) |
| `byvariablelabel` | 字符型 | 71.8% | |
| `istotal` | 64位整型 | 0.0% | 0.0 – 0.0(均值0.0) |
| `ispreferred` | 64位整型 | 0.0% | 0.0 – 1.0(均值0.8643) |
| `sdrid` | 字符型 | 0.0% | |
| `regionid` | 字符型 | 0.0% | |
| `surveyyearlabel` | 字符型 | 0.0% | |
| `surveytype` | 字符型 | 0.0% | |
| `denominatorweighted` | 64位浮点型 | 22.0% | 4.0 – 5856.0(均值694.4618) |
| `denominatorunweighted` | 64位浮点型 | 22.0% | 32.0 – 4535.0(均值702.7861) |
| `levelrank` | 64位浮点型 | 45.8% | 1.0 – 1.0(均值1.0) |
| `esa_source` | 字符型 | 0.0% | |
| `esa_processed` | 字符型 | 0.0% | |
---
## 数值统计摘要
| 列名 | 最小值 | 最大值 | 均值 | 中位数 |
|---|---|---|---|---|
| `dataid` | 10552.0 | 7979564.0 | 4215791.2337 | 4286803.0 |
| `value` | 0.0 | 256.0 | 23.6657 | 14.9 |
| `precision` | 0.0 | 1.0 | 0.9267 | 1.0 |
| `surveyyear` | 1997.0 | 2014.0 | 2011.0779 | 2014.0 |
| `indicatororder` | 11763080.0 | 260321010.0 | 104889642.1173 | 94096040.0 |
| `characteristicid` | 445001.0 | 445038.0 | 445021.0027 | 445020.0 |
| `characteristicorder` | 1445001.0 | 1445082.0 | 1445042.6948 | 1445041.0 |
| `byvariableid` | 0.0 | 631001.0 | 19787.7434 | 0.0 |
| `istotal` | 0.0 | 0.0 | 0.0 | 0.0 |
| `ispreferred` | 0.0 | 1.0 | 0.8643 | 1.0 |
| `denominatorweighted` | 4.0 | 5856.0 | 694.4618 | 516.0 |
| `denominatorunweighted` | 32.0 | 4535.0 | 702.7861 | 538.0 |
| `levelrank` | 1.0 | 1.0 | 1.0 | 1.0 |
---
## 数据整理流程
原始数据通过CKAN应用程序编程接口(API)从HDX平台下载,并转换为Parquet格式。列名统一转换为小写并标准化为蛇形命名法(snake_case)。常见的缺失值标记(`N/A`、`null`、`none`、`-`、`unknown`、`no data`、`#N/A`)被统一替换为`NaN`。移除了2个缺失值占比超过80%的列:`cilow`和`cihigh`。本数据集以固定随机种子(42)按80/20的比例划分为训练集与测试集,并保存为Snappy压缩的Parquet格式文件。
---
## 数据局限性
- 本数据集数据来源于人口与健康调查项目(DHS Program),未经过Electric Sheep Africa的独立验证。
- 自动化清洗流程无法修正原始数据集中的错报值、定义不一致或抽样偏差问题。
- 以下列的缺失值占比超过20%,在建模过程中需谨慎使用:`byvariablelabel`、`denominatorweighted`、`denominatorunweighted`、`levelrank`。
- 如需了解发布方提供的方法学说明与注意事项,请参阅[原始HDX数据集页面](https://data.humdata.org/dataset/dhs-subnational-data-for-chad)。
---
## 引用格式
bibtex
@dataset{hdx_africa_demographics_chad,
title = {Chad - Subnational Demographic and Health Data},
author = {The DHS Program},
year = {2026},
url = {https://data.humdata.org/dataset/dhs-subnational-data-for-chad},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — 非洲机器学习数据集基础设施,尼日利亚拉各斯。*
提供机构:
electricsheepafrica



