five

electricsheepafrica/africa-demographics-nigeria

收藏
Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-demographics-nigeria
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - no-annotation language_creators: - found language: - en license: other multilinguality: - monolingual size_categories: - 1K<n<10K source_datasets: - original task_categories: - tabular-classification - other task_ids: [] tags: - africa - humanitarian - hdx - electric-sheep-africa - demographics - health - nga pretty_name: "Nigeria - Subnational Demographic and Health Data" dataset_info: splits: - name: train num_examples: 3333 - name: test num_examples: 833 --- # Nigeria - Subnational Demographic and Health Data **Publisher:** The DHS Program · **Source:** [HDX](https://data.humdata.org/dataset/dhs-subnational-data-for-nigeria) · **License:** `hdx-other` · **Updated:** 2026-04-20 --- ## Abstract Contains data from the [DHS data portal](https://api.dhsprogram.com/). There is also a dataset containing [Nigeria - National Demographic and Health Data](https://data.humdata.org/dataset/dhs-data-for-nigeria) on HDX. The DHS Program Application Programming Interface (API) provides software developers access to aggregated indicator data from The Demographic and Health Surveys (DHS) Program. The API can be used to create various applications to help analyze, visualize, explore and disseminate data on population, health, HIV, and nutrition from more than 90 countries. Each row in this dataset represents first-level administrative unit observations. Data was last updated on HDX on 2026-04-20. Geographic scope: **NGA**. *Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).* --- ## Dataset Characteristics | | | |---|---| | **Domain** | Public health | | **Unit of observation** | First-level administrative unit observations | | **Rows (total)** | 4,167 | | **Columns** | 30 (14 numeric, 16 categorical, 0 datetime) | | **Train split** | 3,333 rows | | **Test split** | 833 rows | | **Geographic scope** | NGA | | **Publisher** | The DHS Program | | **HDX last updated** | 2026-04-20 | --- ## Variables **Geographic** — `iso3` (NGA), `location` (South South, North Central, North East), `dhs_countrycode` (NG), `countryname` (Nigeria), `surveyyear` (range 1990.0–2024.0) and 8 others. **Outcome / Measurement** — `value` (range 0.0–269.0), `istotal` (range 0.0–0.0). **Identifier / Metadata** — `dataid` (range 880.0–7981394.0), `indicatorid` (RH_DELP_C_DHF, FE_FRTR_W_TFR, ED_EDUC_W_SEH), `characteristicid` (range 433001.0–433066.0), `characteristiclabel` (South South, North Central, North East), `ispreferred` (range 0.0–1.0) and 3 others. **Other** — `indicator` (Place of delivery: Health facility, Total fertility rate 15-49, Women with secondary or higher education), `precision` (range 0.0–1.0), `indicatororder` (range 11763080.0–260321010.0), `characteristicorder` (range 1433001.0–1433066.0), `denominatorweighted` (range 11.0–12558.0) and 2 others. --- ## Quick Start ```python from datasets import load_dataset ds = load_dataset("electricsheepafrica/africa-demographics-nigeria") train = ds["train"].to_pandas() test = ds["test"].to_pandas() print(train.shape) train.head() ``` --- ## Schema | Column | Type | Null % | Range / Sample Values | |---|---|---|---| | `iso3` | object | 0.0% | NGA | | `location` | object | 0.0% | South South, North Central, North East | | `dataid` | int64 | 0.0% | 880.0 – 7981394.0 (mean 4192770.3321) | | `indicator` | object | 0.0% | Place of delivery: Health facility, Total fertility rate 15-49, Women with secondary or higher education | | `value` | float64 | 0.0% | 0.0 – 269.0 (mean 34.6533) | | `precision` | int64 | 0.0% | 0.0 – 1.0 (mean 0.9275) | | `dhs_countrycode` | object | 0.0% | NG | | `countryname` | object | 0.0% | Nigeria | | `surveyyear` | int64 | 0.0% | 1990.0 – 2024.0 (mean 2016.7384) | | `surveyid` | object | 0.0% | NG2024DHS, NG2013DHS, NG2018DHS | | `indicatorid` | object | 0.0% | RH_DELP_C_DHF, FE_FRTR_W_TFR, ED_EDUC_W_SEH | | `indicatororder` | int64 | 0.0% | 11763080.0 – 260321010.0 (mean 111507601.915) | | `indicatortype` | object | 0.0% | I | | `characteristicid` | int64 | 0.0% | 433001.0 – 433066.0 (mean 433030.3393) | | `characteristicorder` | int64 | 0.0% | 1433001.0 – 1433066.0 (mean 1433036.2393) | | `characteristiccategory` | object | 0.0% | Region | | `characteristiclabel` | object | 0.0% | South South, North Central, North East | | `byvariableid` | int64 | 0.0% | 0.0 – 631002.0 (mean 30069.0283) | | `byvariablelabel` | object | 73.5% | | | `istotal` | int64 | 0.0% | 0.0 – 0.0 (mean 0.0) | | `ispreferred` | int64 | 0.0% | 0.0 – 1.0 (mean 0.8817) | | `sdrid` | object | 0.0% | | | `regionid` | object | 0.0% | | | `surveyyearlabel` | int64 | 0.0% | 1990.0 – 2024.0 (mean 2016.7384) | | `surveytype` | object | 0.0% | | | `denominatorweighted` | float64 | 23.2% | 11.0 – 12558.0 (mean 1081.1272) | | `denominatorunweighted` | float64 | 23.2% | 26.0 – 10305.0 (mean 1077.9428) | | `levelrank` | int64 | 0.0% | 1.0 – 2.0 (mean 1.7663) | | `esa_source` | object | 0.0% | | | `esa_processed` | object | 0.0% | | --- ## Numeric Summary | Column | Min | Max | Mean | Median | |---|---|---|---|---| | `dataid` | 880.0 | 7981394.0 | 4192770.3321 | 4160296.0 | | `value` | 0.0 | 269.0 | 34.6533 | 24.2 | | `precision` | 0.0 | 1.0 | 0.9275 | 1.0 | | `surveyyear` | 1990.0 | 2024.0 | 2016.7384 | 2018.0 | | `indicatororder` | 11763080.0 | 260321010.0 | 111507601.915 | 94096040.0 | | `characteristicid` | 433001.0 | 433066.0 | 433030.3393 | 433026.0 | | `characteristicorder` | 1433001.0 | 1433066.0 | 1433036.2393 | 1433035.0 | | `byvariableid` | 0.0 | 631002.0 | 30069.0283 | 0.0 | | `istotal` | 0.0 | 0.0 | 0.0 | 0.0 | | `ispreferred` | 0.0 | 1.0 | 0.8817 | 1.0 | | `surveyyearlabel` | 1990.0 | 2024.0 | 2016.7384 | 2018.0 | | `denominatorweighted` | 11.0 | 12558.0 | 1081.1272 | 550.0 | | `denominatorunweighted` | 26.0 | 10305.0 | 1077.9428 | 590.0 | | `levelrank` | 1.0 | 2.0 | 1.7663 | 2.0 | --- ## Curation Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 2 column(s) with >80% missing values were removed: `cilow`, `cihigh`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet. --- ## Limitations - Data originates from The DHS Program and has not been independently validated by ESA. - Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection. - The following columns have >20% missing values and should be treated with caution in modelling: `byvariablelabel`, `denominatorweighted`, `denominatorunweighted`. - Refer to the [original HDX dataset page](https://data.humdata.org/dataset/dhs-subnational-data-for-nigeria) for the publisher's own methodology notes and caveats. --- ## Citation ```bibtex @dataset{hdx_africa_demographics_nigeria, title = {Nigeria - Subnational Demographic and Health Data}, author = {The DHS Program}, year = {2026}, url = {https://data.humdata.org/dataset/dhs-subnational-data-for-nigeria}, note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)} } ``` --- *[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*

annotations_creators: - 无注释 language_creators: - 公开获取 language: - 英语 license: - 其他 multilinguality: - 单语言 size_categories: - 1000<n<10000 source_datasets: - 原创 task_categories: - 表格分类 - 其他 task_ids: [] tags: - 非洲 - 人道主义 - HDX - electric-sheep-africa - 人口统计学 - 健康 - nga pretty_name: "尼日利亚——次国家级人口与健康数据" dataset_info: splits: - name: train num_examples: 3333 - name: test num_examples: 833 --- # 尼日利亚——次国家级人口与健康数据 **发布方:** 人口与健康调查项目(Demographic and Health Surveys Program,DHS Program) · **来源:** [人道主义数据交换(Humanitarian Data Exchange,HDX)](https://data.humdata.org/dataset/dhs-subnational-data-for-nigeria) · **许可证:** `hdx-other` · **最后更新:** 2026-04-20 --- ## 摘要 本数据集包含来自[DHS数据门户(DHS Data Portal)](https://api.dhsprogram.com/)的数据。人道主义数据交换平台上另有一份包含[尼日利亚——国家级人口与健康数据](https://data.humdata.org/dataset/dhs-data-for-nigeria)的数据集。 人口与健康调查项目应用程序接口(Application Programming Interface,API)允许软件开发人员获取人口与健康调查项目(Demographic and Health Surveys,DHS)的汇总指标数据。该API可用于开发各类应用,助力分析、可视化、探索并传播来自90余个国家的人口、健康、艾滋病病毒(HIV)及营养相关数据。 本数据集的每一行均代表一级行政单元的观测数据。数据集于HDX平台的最后更新时间为2026-04-20。地理覆盖范围:**NGA(尼日利亚ISO 3166-1 alpha-3代码)**。 *由[非洲电羊团队(Electric Sheep Africa)](https://huggingface.co/electricsheepafrica)整理为适配机器学习的Parquet格式。* --- ## 数据集特征 | | | |---|---| | **领域** | 公共卫生 | | **观测单元** | 一级行政单元 | | **总行数** | 4167 | | **列数** | 30列(14个数值型、16个分类型、0个日期时间型) | | **训练集划分** | 3333条数据 | | **测试集划分** | 833条数据 | | **地理覆盖范围** | NGA | | **发布方** | 人口与健康调查项目(DHS Program) | | **HDX平台最后更新时间** | 2026-04-20 | --- ## 变量说明 **地理类变量** — `iso3`(取值为NGA)、`location`(取值为南部南部区、中部北区、东北部等)、`dhs_countrycode`(取值为NG)、`countryname`(取值为尼日利亚)、`surveyyear`(取值范围1990.0–2024.0)及另外8个变量。 **结果/测量类变量** — `value`(取值范围0.0–269.0)、`istotal`(取值范围0.0–0.0)。 **标识符/元数据类变量** — `dataid`(取值范围880.0–7981394.0)、`indicatorid`(取值为RH_DELP_C_DHF、FE_FRTR_W_TFR、ED_EDUC_W_SEH等)、`characteristicid`(取值范围433001.0–433066.0)、`characteristiclabel`(取值为南部南部区、中部北区、东北部等)、`ispreferred`(取值范围0.0–1.0)及另外3个变量。 **其他类变量** — `indicator`(取值为"分娩地点:卫生机构"、"15-49岁女性总生育率"、"接受中等及以上教育的女性"等)、`precision`(取值范围0.0–1.0)、`indicatororder`(取值范围11763080.0–260321010.0)、`characteristicorder`(取值范围1433001.0–1433066.0)、`denominatorweighted`(取值范围11.0–12558.0)及另外2个变量。 --- ## 快速上手 python from datasets import load_dataset ds = load_dataset("electricsheepafrica/africa-demographics-nigeria") train = ds["train"].to_pandas() test = ds["test"].to_pandas() print(train.shape) train.head() --- ## 数据模式 | 列名 | 数据类型 | 缺失率 | 取值范围/示例值 | |---|---|---|---| | `iso3` | 对象型(object) | 0.0% | NGA | | `location` | 对象型(object) | 0.0% | 南部南部区、中部北区、东北部 | | `dataid` | 64位整数型(int64) | 0.0% | 880.0 – 7981394.0(均值4192770.3321) | | `indicator` | 对象型(object) | 0.0% | 分娩地点:卫生机构、15-49岁女性总生育率、接受中等及以上教育的女性 | | `value` | 64位浮点型(float64) | 0.0% | 0.0 – 269.0(均值34.6533) | | `precision` | 64位整数型(int64) | 0.0% | 0.0 – 1.0(均值0.9275) | | `dhs_countrycode` | 对象型(object) | 0.0% | NG | | `countryname` | 对象型(object) | 0.0% | 尼日利亚 | | `surveyyear` | 64位整数型(int64) | 0.0% | 1990.0 – 2024.0(均值2016.7384) | | `surveyid` | 对象型(object) | 0.0% | NG2024DHS、NG2013DHS、NG2018DHS | | `indicatorid` | 对象型(object) | 0.0% | RH_DELP_C_DHF、FE_FRTR_W_TFR、ED_EDUC_W_SEH | | `indicatororder` | 64位整数型(int64) | 0.0% | 11763080.0 – 260321010.0(均值111507601.915) | | `indicatortype` | 对象型(object) | 0.0% | I | | `characteristicid` | 64位整数型(int64) | 0.0% | 433001.0 – 433066.0(均值433030.3393) | | `characteristicorder` | 64位整数型(int64) | 0.0% | 1433001.0 – 1433066.0(均值1433036.2393) | | `characteristiccategory` | 对象型(object) | 0.0% | 区域 | | `characteristiclabel` | 对象型(object) | 0.0% | 南部南部区、中部北区、东北部 | | `byvariableid` | 64位整数型(int64) | 0.0% | 0.0 – 631002.0(均值30069.0283) | | `byvariablelabel` | 对象型(object) | 73.5% | | | `istotal` | 64位整数型(int64) | 0.0% | 0.0 – 0.0(均值0.0) | | `ispreferred` | 64位整数型(int64) | 0.0% | 0.0 – 1.0(均值0.8817) | | `sdrid` | 对象型(object) | 0.0% | | | `regionid` | 对象型(object) | 0.0% | | | `surveyyearlabel` | 64位整数型(int64) | 0.0% | 1990.0 – 2024.0(均值2016.7384) | | `surveytype` | 对象型(object) | 0.0% | | | `denominatorweighted` | 64位浮点型(float64) | 23.2% | 11.0 – 12558.0(均值1081.1272) | | `denominatorunweighted` | 64位浮点型(float64) | 23.2% | 26.0 – 10305.0(均值1077.9428) | | `levelrank` | 64位整数型(int64) | 0.0% | 1.0 – 2.0(均值1.7663) | | `esa_source` | 对象型(object) | 0.0% | | | `esa_processed` | 对象型(object) | 0.0% | | --- ## 数值型变量统计摘要 | 列名 | 最小值 | 最大值 | 均值 | 中位数 | |---|---|---|---|---| | `dataid` | 880.0 | 7981394.0 | 4192770.3321 | 4160296.0 | | `value` | 0.0 | 269.0 | 34.6533 | 24.2 | | `precision` | 0.0 | 1.0 | 0.9275 | 1.0 | | `surveyyear` | 1990.0 | 2024.0 | 2016.7384 | 2018.0 | | `indicatororder` | 11763080.0 | 260321010.0 | 111507601.915 | 94096040.0 | | `characteristicid` | 433001.0 | 433066.0 | 433030.3393 | 433026.0 | | `characteristicorder` | 1433001.0 | 1433066.0 | 1433036.2393 | 1433035.0 | | `byvariableid` | 0.0 | 631002.0 | 30069.0283 | 0.0 | | `istotal` | 0.0 | 0.0 | 0.0 | 0.0 | | `ispreferred` | 0.0 | 1.0 | 0.8817 | 1.0 | | `surveyyearlabel` | 1990.0 | 2024.0 | 2016.7384 | 2018.0 | | `denominatorweighted` | 11.0 | 12558.0 | 1081.1272 | 550.0 | | `denominatorunweighted` | 26.0 | 10305.0 | 1077.9428 | 590.0 | | `levelrank` | 1.0 | 2.0 | 1.7663 | 2.0 | --- ## 数据整理流程 原始数据通过CKAN API从HDX平台下载,并转换为Parquet格式。列名均转换为小写,并统一为蛇形命名法(snake_case)。常见缺失值标记(`N/A`、`null`、`none`、`-`、`unknown`、`no data`、`#N/A`)均统一为`NaN`。移除了2个缺失率超过80%的列:`cilow`与`cihigh`。数据集以固定随机种子(42)按照80/20的比例划分为训练集与测试集,并以Snappy压缩的Parquet格式保存。 --- ## 数据集局限性 - 本数据集源自人口与健康调查项目,未经过非洲电羊团队(ESA)的独立验证。 - 自动化数据清洗无法修正原始数据收集中的错报值、定义不一致或抽样偏差问题。 - 以下列的缺失率超过20%,在建模过程中需谨慎使用:`byvariablelabel`、`denominatorweighted`、`denominatorunweighted`。 - 请参阅[HDX平台原始数据集页面](https://data.humdata.org/dataset/dhs-subnational-data-for-nigeria)获取发布方提供的方法说明与注意事项。 --- ## 引用格式 bibtex @dataset{hdx_africa_demographics_nigeria, title = {Nigeria - Subnational Demographic and Health Data}, author = {The DHS Program}, year = {2026}, url = {https://data.humdata.org/dataset/dhs-subnational-data-for-nigeria}, note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)} } --- *[非洲电羊团队(Electric Sheep Africa)](https://huggingface.co/electricsheepafrica) — 非洲机器学习数据集基础设施。尼日利亚拉各斯。*
提供机构:
electricsheepafrica
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作