five

electricsheepafrica/africa-world-bank-health-indicators-for-kenya

收藏
Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-world-bank-health-indicators-for-kenya
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - no-annotation language_creators: - found language: - en license: cc-by-4.0 multilinguality: - monolingual size_categories: - 10K<n<100K source_datasets: - original task_categories: - tabular-classification task_ids: [] tags: - africa - humanitarian - hdx - electric-sheep-africa - health - indicators - ken pretty_name: "Kenya - Health" dataset_info: splits: - name: train num_examples: 8127 - name: test num_examples: 2031 --- # Kenya - Health **Publisher:** World Bank Group · **Source:** [HDX](https://data.humdata.org/dataset/world-bank-health-indicators-for-kenya) · **License:** `cc-by` · **Updated:** 2026-03-27 --- ## Abstract Contains data from the World Bank's [data portal](http://data.worldbank.org/). There is also a [consolidated country dataset](https://data.humdata.org/dataset/world-bank-combined-indicators-for-kenya) on HDX. Improving health is central to the Millennium Development Goals, and the public sector is the main provider of health care in developing countries. To reduce inequities, many countries have emphasized primary health care, including immunization, sanitation, access to safe drinking water, and safe motherhood initiatives. Data here cover health systems, disease prevention, reproductive health, nutrition, and population dynamics. Data are from the United Nations Population Division, World Health Organization, United Nations Children's Fund, the Joint United Nations Programme on HIV/AIDS, and various other sources. Each row in this dataset represents country-level aggregates. Data was last updated on HDX on 2026-03-27. Geographic scope: **KEN**. *Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).* --- ## Dataset Characteristics | | | |---|---| | **Domain** | Public health | | **Unit of observation** | Country-level aggregates | | **Rows (total)** | 10,159 | | **Columns** | 8 (2 numeric, 6 categorical, 0 datetime) | | **Train split** | 8,127 rows | | **Test split** | 2,031 rows | | **Geographic scope** | KEN | | **Publisher** | World Bank Group | | **HDX last updated** | 2026-03-27 | --- ## Variables **Geographic** — `country_name` (Kenya), `country_iso3` (KEN), `year` (range 1960.0–2025.0). **Outcome / Measurement** — `value` (range -115436.0–56432944.0). **Identifier / Metadata** — `indicator_name` (Net migration, Population ages 40-44, female (% of female population), Population ages 0-14, male), `indicator_code` (SM.POP.NETM, SP.POP.4044.FE.5Y, SP.POP.0014.MA.IN), `esa_source` (HDX), `esa_processed` (2026-04-09). --- ## Quick Start ```python from datasets import load_dataset ds = load_dataset("electricsheepafrica/africa-world-bank-health-indicators-for-kenya") train = ds["train"].to_pandas() test = ds["test"].to_pandas() print(train.shape) train.head() ``` --- ## Schema | Column | Type | Null % | Range / Sample Values | |---|---|---|---| | `country_name` | object | 0.0% | Kenya | | `country_iso3` | object | 0.0% | KEN | | `year` | int64 | 0.0% | 1960.0 – 2025.0 (mean 1998.779) | | `indicator_name` | object | 0.0% | Net migration, Population ages 40-44, female (% of female population), Population ages 0-14, male | | `indicator_code` | object | 0.0% | SM.POP.NETM, SP.POP.4044.FE.5Y, SP.POP.0014.MA.IN | | `value` | float64 | 0.0% | -115436.0 – 56432944.0 (mean 701039.4678) | | `esa_source` | object | 0.0% | HDX | | `esa_processed` | object | 0.0% | 2026-04-09 | --- ## Numeric Summary | Column | Min | Max | Mean | Median | |---|---|---|---|---| | `year` | 1960.0 | 2025.0 | 1998.779 | 2003.0 | | `value` | -115436.0 | 56432944.0 | 701039.4678 | 29.6966 | --- ## Curation Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet. --- ## Limitations - Data originates from World Bank Group and has not been independently validated by ESA. - Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection. - Refer to the [original HDX dataset page](https://data.humdata.org/dataset/world-bank-health-indicators-for-kenya) for the publisher's own methodology notes and caveats. --- ## Citation ```bibtex @dataset{hdx_africa_world_bank_health_indicators_for_kenya, title = {Kenya - Health}, author = {World Bank Group}, year = {2026}, url = {https://data.humdata.org/dataset/world-bank-health-indicators-for-kenya}, note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)} } ``` --- *[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*

annotations_creators: - 无标注(no-annotation) language_creators: - 现有公开资源抓取(found) language: - 英语(en) license: - 知识共享署名4.0协议(CC BY 4.0) multilinguality: - 单语言(monolingual) size_categories: - 10000<n<100000 source_datasets: - 原创数据集 task_categories: - 表格分类(tabular-classification) task_ids: [] tags: - 非洲(africa) - 人道主义(humanitarian) - HDX(人道主义数据交换平台) - 非洲电羊(Electric Sheep Africa) - 卫生(health) - 指标(indicators) - 肯尼亚(KEN) pretty_name: "肯尼亚 - 卫生" dataset_info: splits: - name: train num_examples: 8127 - name: test num_examples: 2031 --- # 肯尼亚 - 卫生 **发布方**:世界银行集团(World Bank Group) · **来源**:[HDX(人道主义数据交换平台)](https://data.humdata.org/dataset/world-bank-health-indicators-for-kenya) · **授权协议**:`CC BY` · **最后更新**:2026-03-27 --- ## 摘要 本数据集包含来自世界银行[数据门户](http://data.worldbank.org/)的公开数据。HDX平台上还提供了一份[肯尼亚综合国家数据集](https://data.humdata.org/dataset/world-bank-combined-indicators-for-kenya)。 改善卫生状况是千年发展目标(Millennium Development Goals)的核心内容,而公共部门是发展中国家医疗服务的主要供给方。为减少卫生资源分配不均,多国均将初级卫生保健作为工作重点,涵盖免疫接种、环境卫生、安全饮用水获取以及孕产安全相关举措。本数据集涵盖卫生系统、疾病预防、生殖健康、营养与人口动态等领域的数据,数据来源包括联合国人口司、世界卫生组织、联合国儿童基金会、联合国艾滋病规划署以及其他多个官方机构。 本数据集的每一行均代表肯尼亚国家级的汇总统计数据。本数据集在HDX平台的最后更新时间为2026-03-27,地理覆盖范围:**肯尼亚(KEN)**。 *本数据集已由[非洲电羊(Electric Sheep Africa)](https://huggingface.co/electricsheepafrica)整理为适用于机器学习的Parquet格式。* --- ## 数据集特征 | | | |---|---| | **领域** | 公共卫生 | | **观测单元** | 国家级汇总统计单元 | | **总样本量** | 10,159条 | | **列数** | 共8列(2个数值型、6个分类型、0个日期时间型) | | **训练集样本量** | 8,127条 | | **测试集样本量** | 2,031条 | | **地理覆盖范围** | 肯尼亚(KEN) | | **发布方** | 世界银行集团 | | **HDX平台最后更新时间** | 2026-03-27 | --- ## 变量说明 **地理类变量** — `country_name`(国家名称:肯尼亚)、`country_iso3`(国家ISO3代码:KEN)、`year`(年份范围:1960.0~2025.0)。 **结果/测量类变量** — `value`(指标数值,取值范围:-115436.0~56432944.0)。 **标识符/元数据类变量** — `indicator_name`(指标名称,例如:净迁移人口、40-44岁女性人口占女性总人口比例、0-14岁男性人口)、`indicator_code`(指标代码,例如:SM.POP.NETM、SP.POP.4044.FE.5Y、SP.POP.0014.MA.IN)、`esa_source`(数据来源:HDX)、`esa_processed`(数据整理日期:2026-04-09)。 --- ## 快速上手 python from datasets import load_dataset ds = load_dataset("electricsheepafrica/africa-world-bank-health-indicators-for-kenya") train = ds["train"].to_pandas() test = ds["test"].to_pandas() print(train.shape) train.head() --- ## 数据结构 | 列名 | 数据类型 | 空值占比 | 取值范围/示例值 | |---|---|---|---| | `country_name` | 字符串型(object) | 0.0% | 肯尼亚 | | `country_iso3` | 字符串型(object) | 0.0% | KEN | | `year` | 64位整型(int64) | 0.0% | 1960.0 – 2025.0(均值:1998.779) | | `indicator_name` | 字符串型(object) | 0.0% | 净迁移人口、40-44岁女性人口占女性总人口比例、0-14岁男性人口 | | `indicator_code` | 字符串型(object) | 0.0% | SM.POP.NETM、SP.POP.4044.FE.5Y、SP.POP.0014.MA.IN | | `value` | 64位浮点型(float64) | 0.0% | -115436.0 – 56432944.0(均值:701039.4678) | | `esa_source` | 字符串型(object) | 0.0% | HDX | | `esa_processed` | 字符串型(object) | 0.0% | 2026-04-09 | --- ## 数值型变量统计摘要 | 列名 | 最小值 | 最大值 | 均值 | 中位数 | |---|---|---|---|---| | `year` | 1960.0 | 2025.0 | 1998.779 | 2003.0 | | `value` | -115436.0 | 56432944.0 | 701039.4678 | 29.6966 | --- ## 数据整理流程 原始数据通过CKAN API从HDX平台下载并转换为Parquet格式。所有列名均转换为小写并统一为蛇形命名规范。常见的缺失值标记(`N/A`、`null`、`none`、`-`、`unknown`、`no data`、`#N/A`)均被统一替换为`NaN`。本数据集以固定随机种子(42)按80/20的比例划分为训练集与测试集,并以Snappy压缩格式保存为Parquet文件。 --- ## 数据集局限性 - 本数据集的数据来源为世界银行集团,未经过非洲电羊(ESA)的独立验证。 - 自动化数据清洗流程无法修正原始数据中的错报值、定义不一致问题或原始采集阶段的抽样偏差。 - 如需了解发布方的方法论说明与相关注意事项,请参阅[HDX平台原始数据集页面](https://data.humdata.org/dataset/world-bank-health-indicators-for-kenya)。 --- ## 引用格式 bibtex @dataset{hdx_africa_world_bank_health_indicators_for_kenya, title = {Kenya - Health}, author = {World Bank Group}, year = {2026}, url = {https://data.humdata.org/dataset/world-bank-health-indicators-for-kenya}, note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)} } --- *[非洲电羊(Electric Sheep Africa)](https://huggingface.co/electricsheepafrica) — 非洲地区机器学习数据集基础设施。尼日利亚拉各斯。*
提供机构:
electricsheepafrica
二维码
社区交流群
二维码
科研交流群
商业服务