electricsheepafrica/africa-demographics-cote-divoire
收藏Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-demographics-cote-divoire
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: other
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- tabular-classification
- other
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- demographics
- health
- civ
pretty_name: "Côte d'Ivoire - Subnational Demographic and Health Data"
dataset_info:
splits:
- name: train
num_examples: 905
- name: test
num_examples: 226
---
# Côte d'Ivoire - Subnational Demographic and Health Data
**Publisher:** The DHS Program · **Source:** [HDX](https://data.humdata.org/dataset/dhs-subnational-data-for-cote-d-ivoire) · **License:** `hdx-other` · **Updated:** 2026-04-20
---
## Abstract
Contains data from the [DHS data portal](https://api.dhsprogram.com/). There is also a dataset containing [Côte d'Ivoire - National Demographic and Health Data](https://data.humdata.org/dataset/dhs-data-for-cote-d-ivoire) on HDX.
The DHS Program Application Programming Interface (API) provides software developers access to aggregated indicator data from The Demographic and Health Surveys (DHS) Program. The API can be used to create various applications to help analyze, visualize, explore and disseminate data on population, health, HIV, and nutrition from more than 90 countries.
Each row in this dataset represents first-level administrative unit observations. Data was last updated on HDX on 2026-04-20. Geographic scope: **CIV**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Public health |
| **Unit of observation** | First-level administrative unit observations |
| **Rows (total)** | 1,132 |
| **Columns** | 30 (13 numeric, 17 categorical, 0 datetime) |
| **Train split** | 905 rows |
| **Test split** | 226 rows |
| **Geographic scope** | CIV |
| **Publisher** | The DHS Program |
| **HDX last updated** | 2026-04-20 |
---
## Variables
**Geographic** — `iso3` (CIV), `location` (Abidjan, Centre, Bas Sassandra), `dhs_countrycode` (CI), `countryname` (Cote d'Ivoire), `surveyyear` (range 1994.0–2021.0) and 8 others.
**Outcome / Measurement** — `value` (range 0.0–209.0), `istotal` (range 0.0–0.0).
**Identifier / Metadata** — `dataid` (range 171.0–7981373.0), `indicatorid` (RH_DELP_C_DHF, CH_DIAT_C_ORT, FE_FRTR_W_TFR), `characteristicid` (range 395001.0–395027.0), `characteristiclabel` (Abidjan, Centre, Bas Sassandra), `ispreferred` (range 0.0–1.0) and 3 others.
**Other** — `indicator` (Place of delivery: Health facility, Treatment of diarrhea: Either ORS or RHF, Total fertility rate 15-49), `precision` (range 0.0–1.0), `indicatororder` (range 11763080.0–260321010.0), `characteristicorder` (range 1395010.0–1395190.0), `denominatorweighted` (range 9.0–4689.0) and 2 others.
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-demographics-cote-divoire")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `iso3` | object | 0.0% | CIV |
| `location` | object | 0.0% | Abidjan, Centre, Bas Sassandra |
| `dataid` | int64 | 0.0% | 171.0 – 7981373.0 (mean 4415574.7447) |
| `indicator` | object | 0.0% | Place of delivery: Health facility, Treatment of diarrhea: Either ORS or RHF, Total fertility rate 15-49 |
| `value` | float64 | 0.0% | 0.0 – 209.0 (mean 31.308) |
| `precision` | int64 | 0.0% | 0.0 – 1.0 (mean 0.9152) |
| `dhs_countrycode` | object | 0.0% | CI |
| `countryname` | object | 0.0% | Cote d'Ivoire |
| `surveyyear` | int64 | 0.0% | 1994.0 – 2021.0 (mean 2011.22) |
| `surveyid` | object | 0.0% | CI2021DHS, CI2012DHS, CI2005AIS |
| `indicatorid` | object | 0.0% | RH_DELP_C_DHF, CH_DIAT_C_ORT, FE_FRTR_W_TFR |
| `indicatororder` | int64 | 0.0% | 11763080.0 – 260321010.0 (mean 104964150.1678) |
| `indicatortype` | object | 0.0% | I |
| `characteristicid` | int64 | 0.0% | 395001.0 – 395027.0 (mean 395012.242) |
| `characteristicorder` | int64 | 0.0% | 1395010.0 – 1395190.0 (mean 1395103.0813) |
| `characteristiccategory` | object | 0.0% | Region |
| `characteristiclabel` | object | 0.0% | Abidjan, Centre, Bas Sassandra |
| `byvariableid` | int64 | 0.0% | 0.0 – 631002.0 (mean 27481.8887) |
| `byvariablelabel` | object | 71.1% | |
| `istotal` | int64 | 0.0% | 0.0 – 0.0 (mean 0.0) |
| `ispreferred` | int64 | 0.0% | 0.0 – 1.0 (mean 0.879) |
| `sdrid` | object | 0.0% | |
| `regionid` | object | 0.0% | |
| `surveyyearlabel` | object | 0.0% | |
| `surveytype` | object | 0.0% | |
| `denominatorweighted` | float64 | 24.5% | 9.0 – 4689.0 (mean 547.8784) |
| `denominatorunweighted` | float64 | 24.5% | 34.0 – 4247.0 (mean 586.0386) |
| `levelrank` | float64 | 22.2% | 1.0 – 1.0 (mean 1.0) |
| `esa_source` | object | 0.0% | |
| `esa_processed` | object | 0.0% | |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `dataid` | 171.0 | 7981373.0 | 4415574.7447 | 4500569.0 |
| `value` | 0.0 | 209.0 | 31.308 | 20.6 |
| `precision` | 0.0 | 1.0 | 0.9152 | 1.0 |
| `surveyyear` | 1994.0 | 2021.0 | 2011.22 | 2012.0 |
| `indicatororder` | 11763080.0 | 260321010.0 | 104964150.1678 | 94096040.0 |
| `characteristicid` | 395001.0 | 395027.0 | 395012.242 | 395012.0 |
| `characteristicorder` | 1395010.0 | 1395190.0 | 1395103.0813 | 1395110.0 |
| `byvariableid` | 0.0 | 631002.0 | 27481.8887 | 0.0 |
| `istotal` | 0.0 | 0.0 | 0.0 | 0.0 |
| `ispreferred` | 0.0 | 1.0 | 0.879 | 1.0 |
| `denominatorweighted` | 9.0 | 4689.0 | 547.8784 | 367.0 |
| `denominatorunweighted` | 34.0 | 4247.0 | 586.0386 | 468.0 |
| `levelrank` | 1.0 | 1.0 | 1.0 | 1.0 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 2 column(s) with >80% missing values were removed: `cilow`, `cihigh`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from The DHS Program and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- The following columns have >20% missing values and should be treated with caution in modelling: `byvariablelabel`, `denominatorweighted`, `denominatorunweighted`, `levelrank`.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/dhs-subnational-data-for-cote-d-ivoire) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_demographics_cote_divoire,
title = {Côte d'Ivoire - Subnational Demographic and Health Data},
author = {The DHS Program},
year = {2026},
url = {https://data.humdata.org/dataset/dhs-subnational-data-for-cote-d-ivoire},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
annotations_creators:
- 无注释
language_creators:
- 现成语料采集
language:
- en
license:
- 其他
multilinguality:
- 单语言
size_categories:
- 1000 < n < 10000
source_datasets:
- 原创数据集
task_categories:
- 表格分类
- 其他
task_ids: []
tags:
- 非洲
- 人道主义
- HDX
- electric-sheep-africa
- 人口统计学
- 健康
- CIV
pretty_name: "科特迪瓦——次国家级人口与健康数据"
dataset_info:
splits:
- name: train
num_examples: 905
- name: test
num_examples: 226
---
# 科特迪瓦——次国家级人口与健康数据
**发布方:** 人口与健康调查项目(Demographic and Health Surveys, DHS) · **来源:** [人道主义数据交换(Humanitarian Data Exchange, HDX)](https://data.humdata.org/dataset/dhs-subnational-data-for-cote-d-ivoire) · **授权协议:** `hdx-other` · **更新时间:** 2026-04-20
---
## 摘要
本数据集的数据源自[DHS数据门户](https://api.dhsprogram.com/)。HDX平台上另有一份[科特迪瓦——国家级人口与健康数据](https://data.humdata.org/dataset/dhs-data-for-cote-d-ivoire)数据集。
人口与健康调查项目应用程序编程接口(DHS Program API)可为软件开发人员提供来自人口与健康调查(DHS)项目的聚合指标数据。该API可用于构建各类应用,以辅助分析、可视化、探索并传播来自全球90余个国家的人口、健康、HIV及营养相关数据。
本数据集的每一行均代表一级行政单元的观测记录。该数据集最后一次在HDX平台更新的时间为2026-04-20。地理覆盖范围:**CIV(科特迪瓦)**。
*本数据集已由[Electric Sheep Africa(电羊非洲团队)](https://huggingface.co/electricsheepafrica)整理为适用于机器学习的Parquet格式。*
---
## 数据集特征
| | |
|---|---|
| **领域** | 公共卫生 |
| **观测单元** | 一级行政单元观测记录 |
| **总行数** | 1,132 |
| **列数** | 30(13个数值型列,17个分类型列,0个日期时间列) |
| **训练集划分** | 905条数据 |
| **测试集划分** | 226条数据 |
| **地理覆盖范围** | CIV(科特迪瓦) |
| **发布方** | 人口与健康调查项目(DHS) |
| **HDX平台最后更新时间** | 2026-04-20 |
---
## 变量分类
**地理信息类** — `iso3`(CIV,科特迪瓦ISO3代码)、`location`(阿比让、中部、下萨桑德拉)、`dhs_countrycode`(CI)、`countryname`(科特迪瓦)、`surveyyear`(取值范围1994.0–2021.0)及其他8个字段。
**结果/测量类** — `value`(取值范围0.0–209.0)、`istotal`(取值范围0.0–0.0)。
**标识符/元数据类** — `dataid`(取值范围171.0–7981373.0)、`indicatorid`(RH_DELP_C_DHF、CH_DIAT_C_ORT、FE_FRTR_W_TFR)、`characteristicid`(取值范围395001.0–395027.0)、`characteristiclabel`(阿比让、中部、下萨桑德拉)、`ispreferred`(取值范围0.0–1.0)及其他3个字段。
**其他类** — `indicator`(分娩地点:卫生机构、腹泻治疗:口服补液盐或快速健康检测、15-49岁总生育率)、`precision`(取值范围0.0–1.0)、`indicatororder`(取值范围11763080.0–260321010.0)、`characteristicorder`(取值范围1395010.0–1395190.0)、`denominatorweighted`(取值范围9.0–4689.0)及其他2个字段。
---
## 快速上手
python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-demographics-cote-divoire")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
---
## 数据结构
| 列名 | 数据类型 | 缺失率 | 取值范围/示例值 |
|---|---|---|---|
| `iso3` | 字符串(object) | 0.0% | CIV |
| `location` | 字符串(object) | 0.0% | 阿比让、中部、下萨桑德拉 |
| `dataid` | 64位整数(int64) | 0.0% | 171.0 – 7981373.0(均值 4415574.7447) |
| `indicator` | 字符串(object) | 0.0% | 分娩地点:卫生机构、腹泻治疗:口服补液盐或快速健康检测、15-49岁总生育率 |
| `value` | 64位浮点数(float64) | 0.0% | 0.0 – 209.0(均值 31.308) |
| `precision` | 64位整数(int64) | 0.0% | 0.0 – 1.0(均值 0.9152) |
| `dhs_countrycode` | 字符串(object) | 0.0% | CI |
| `countryname` | 字符串(object) | 0.0% | 科特迪瓦 |
| `surveyyear` | 64位整数(int64) | 0.0% | 1994.0 – 2021.0(均值 2011.22) |
| `surveyid` | 字符串(object) | 0.0% | CI2021DHS、CI2012DHS、CI2005AIS |
| `indicatorid` | 字符串(object) | 0.0% | RH_DELP_C_DHF、CH_DIAT_C_ORT、FE_FRTR_W_TFR |
| `indicatororder` | 64位整数(int64) | 0.0% | 11763080.0 – 260321010.0(均值 104964150.1678) |
| `indicatortype` | 字符串(object) | 0.0% | I |
| `characteristicid` | 64位整数(int64) | 0.0% | 395001.0 – 395027.0(均值 395012.242) |
| `characteristicorder` | 64位整数(int64) | 0.0% | 1395010.0 – 1395190.0(均值 1395103.0813) |
| `characteristiccategory` | 字符串(object) | 0.0% | 地区 |
| `characteristiclabel` | 字符串(object) | 0.0% | 阿比让、中部、下萨桑德拉 |
| `byvariableid` | 64位整数(int64) | 0.0% | 0.0 – 631002.0(均值 27481.8887) |
| `byvariablelabel` | 字符串(object) | 71.1% | 无有效取值 |
| `istotal` | 64位整数(int64) | 0.0% | 0.0 – 0.0(均值 0.0) |
| `ispreferred` | 64位整数(int64) | 0.0% | 0.0 – 1.0(均值 0.879) |
| `sdrid` | 字符串(object) | 0.0% | 无有效取值 |
| `regionid` | 字符串(object) | 0.0% | 无有效取值 |
| `surveyyearlabel` | 字符串(object) | 0.0% | 无有效取值 |
| `surveytype` | 字符串(object) | 0.0% | 无有效取值 |
| `denominatorweighted` | 64位浮点数(float64) | 24.5% | 9.0 – 4689.0(均值 547.8784) |
| `denominatorunweighted` | 64位浮点数(float64) | 24.5% | 34.0 – 4247.0(均值 586.0386) |
| `levelrank` | 64位浮点数(float64) | 22.2% | 1.0 – 1.0(均值 1.0) |
| `esa_source` | 字符串(object) | 0.0% | 无有效取值 |
| `esa_processed` | 字符串(object) | 0.0% | 无有效取值 |
---
## 数值型字段统计摘要
| 列名 | 最小值 | 最大值 | 均值 | 中位数 |
|---|---|---|---|---|
| `dataid` | 171.0 | 7981373.0 | 4415574.7447 | 4500569.0 |
| `value` | 0.0 | 209.0 | 31.308 | 20.6 |
| `precision` | 0.0 | 1.0 | 0.9152 | 1.0 |
| `surveyyear` | 1994.0 | 2021.0 | 2011.22 | 2012.0 |
| `indicatororder` | 11763080.0 | 260321010.0 | 104964150.1678 | 94096040.0 |
| `characteristicid` | 395001.0 | 395027.0 | 395012.242 | 395012.0 |
| `characteristicorder` | 1395010.0 | 1395190.0 | 1395103.0813 | 1395110.0 |
| `byvariableid` | 0.0 | 631002.0 | 27481.8887 | 0.0 |
| `istotal` | 0.0 | 0.0 | 0.0 | 0.0 |
| `ispreferred` | 0.0 | 1.0 | 0.879 | 1.0 |
| `denominatorweighted` | 9.0 | 4689.0 | 547.8784 | 367.0 |
| `denominatorunweighted` | 34.0 | 4247.0 | 586.0386 | 468.0 |
| `levelrank` | 1.0 | 1.0 | 1.0 | 1.0 |
---
## 数据整理流程
原始数据通过CKAN应用程序编程接口(CKAN API)从HDX平台下载,并转换为Parquet格式。所有列名均转换为小写并统一为蛇形命名法(snake_case)。将常见的缺失值标记(`N/A`、`null`、`none`、`-`、`unknown`、`no data`、`#N/A`)统一替换为`NaN`。删除了2个缺失率超过80%的列:`cilow`和`cihigh`。本数据集采用固定随机种子(42)按照80/20的比例划分为训练集与测试集,并以Snappy压缩算法存储为Parquet格式。
---
## 数据集局限性
- 本数据集源自人口与健康调查项目(DHS),并未经Electric Sheep Africa(电羊非洲团队)独立验证。
- 自动化清洗流程无法修正原始数据收集阶段存在的错报值、定义不一致或抽样偏差问题。
- 以下列的缺失率超过20%,在建模过程中使用时需谨慎:`byvariablelabel`、`denominatorweighted`、`denominatorunweighted`、`levelrank`。
- 如需查看发布方官方的方法论说明与注意事项,请参阅[原始HDX数据集页面](https://data.humdata.org/dataset/dhs-subnational-data-for-cote-d-ivoire)。
---
## 引用格式
bibtex
@dataset{hdx_africa_demographics_cote_divoire,
title = {Côte d'Ivoire - Subnational Demographic and Health Data},
author = {The DHS Program},
year = {2026},
url = {https://data.humdata.org/dataset/dhs-subnational-data-for-cote-d-ivoire},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
---
*[Electric Sheep Africa(电羊非洲团队)](https://huggingface.co/electricsheepafrica)——非洲机器学习数据集基础设施提供商,尼日利亚拉各斯。*
提供机构:
electricsheepafrica



