five

electricsheepafrica/africa-unicef-esaro-regional-db-31-october-2017

收藏
Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-unicef-esaro-regional-db-31-october-2017
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - no-annotation language_creators: - found language: - en license: cc-by-4.0 multilinguality: - monolingual size_categories: - n<1K source_datasets: - original task_categories: - tabular-classification - tabular-regression task_ids: [] tags: - africa - humanitarian - hdx - electric-sheep-africa - eastern-africa - funding - ago - bdi - eri - eth - ken pretty_name: "UNICEF ESARO Regional db 31 March 2018" dataset_info: splits: - name: train num_examples: 14 - name: test num_examples: 3 --- # UNICEF ESARO Regional db 31 March 2018 **Publisher:** UNICEF Eastern and Southern Africa Regional Office (ESARO) (inactive) · **Source:** [HDX](https://data.humdata.org/dataset/unicef-esaro-regional-db-31-october-2017) · **License:** `cc-by` · **Updated:** 2024-08-30 --- ## Abstract UNICEF Eastern and Southern Africa database - Target, Response and Funding as of 31 March 2018 Each row in this dataset represents tabular records. Temporal coverage is indicated by the `unnamed_1` column(s). Geographic scope: **AGO, BDI, ERI, ETH, KEN, MDG, SOM, SSD, and 1 others**. *Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).* --- ## Dataset Characteristics | | | |---|---| | **Domain** | Humanitarian and development data | | **Unit of observation** | Tabular records | | **Rows (total)** | 18 | | **Columns** | 4 (0 numeric, 3 categorical, 1 datetime) | | **Train split** | 14 rows | | **Test split** | 3 rows | | **Geographic scope** | AGO, BDI, ERI, ETH, KEN, MDG, SOM, SSD, and 1 others | | **Publisher** | UNICEF Eastern and Southern Africa Regional Office (ESARO) (inactive) | | **HDX last updated** | 2024-08-30 | --- ## Variables **Identifier / Metadata** — `unnamed_1`, `esa_source` (HDX), `esa_processed` (2026-04-06). **Other** — `instructions` (1. The sheets to be updated are individual country tabs (Som etc.) and the "funding details" tabs. The "situation" tab updated to reflect any changes in context or situational data., South Sudan, Madagascar). --- ## Quick Start ```python from datasets import load_dataset ds = load_dataset("electricsheepafrica/africa-unicef-esaro-regional-db-31-october-2017") train = ds["train"].to_pandas() test = ds["test"].to_pandas() print(train.shape) train.head() ``` --- ## Schema | Column | Type | Null % | Range / Sample Values | |---|---|---|---| | `instructions` | object | 5.6% | 1. The sheets to be updated are individual country tabs (Som etc.) and the "funding details" tabs. The "situation" tab updated to reflect any changes in context or situational data., South Sudan, Madagascar | | `unnamed_1` | datetime64[ns] | 55.6% | | | `esa_source` | object | 0.0% | HDX | | `esa_processed` | object | 0.0% | 2026-04-06 | --- ## Numeric Summary | Column | Min | Max | Mean | Median | |---|---|---|---|---| _No numeric columns._ --- ## Curation Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 1 column(s) with >80% missing values were removed: `unnamed_2`. 1 exact duplicate rows were removed. 1 column(s) were cast from string to numeric or datetime based on parse-success rate (>85% threshold). The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet. --- ## Limitations - Data originates from UNICEF Eastern and Southern Africa Regional Office (ESARO) (inactive) and has not been independently validated by ESA. - Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection. - The following columns have >20% missing values and should be treated with caution in modelling: `unnamed_1`. - This dataset spans 9 countries; geographic and methodological inconsistencies across national boundaries may affect cross-country comparability. - Refer to the [original HDX dataset page](https://data.humdata.org/dataset/unicef-esaro-regional-db-31-october-2017) for the publisher's own methodology notes and caveats. --- ## Citation ```bibtex @dataset{hdx_africa_unicef_esaro_regional_db_31_october_2017, title = {UNICEF ESARO Regional db 31 March 2018}, author = {UNICEF Eastern and Southern Africa Regional Office (ESARO) (inactive)}, year = {2024}, url = {https://data.humdata.org/dataset/unicef-esaro-regional-db-31-october-2017}, note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)} } ``` --- *[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*

annotations_creators: - 无标注 language_creators: - 采集自现有数据 language: - 英语 license: cc-by-4.0 multilinguality: - 单语言 size_categories: - 样本量小于1000 source_datasets: - 原创数据集 task_categories: - 表格分类 - 表格回归 task_ids: [] tags: - 非洲 - 人道主义 - HDX(Humanitarian Data Exchange) - electric-sheep-africa - 东非 - 资金 - AGO - BDI - ERI - ETH - KEN pretty_name: "联合国儿童基金会东非和南部非洲区域数据库 2018年3月31日版" dataset_info: splits: - name: train num_examples: 14 - name: test num_examples: 3 # 联合国儿童基金会东非和南部非洲区域办事处(ESARO)区域数据库 2018年3月31日版 **发布方:** 联合国儿童基金会东非和南部非洲区域办事处(ESARO,已停用) · **来源:** [HDX(Humanitarian Data Exchange)](https://data.humdata.org/dataset/unicef-esaro-regional-db-31-october-2017) · **授权协议:** `cc-by` · **更新时间:** 2024-08-30 --- ## 摘要 联合国儿童基金会东非和南部非洲区域数据库——截至2018年3月31日的目标、响应与资金情况 本数据集内每一行代表一条表格记录。时间覆盖范围由`unnamed_1`列标注。地理覆盖范围:**AGO、BDI、ERI、ETH、KEN、MDG、SOM、SSD 及另外1个国家**。 *本数据集已由[Electric Sheep Africa](https://huggingface.co/electricsheepafrica)整理为适合机器学习使用的Parquet格式。* --- ## 数据集特征 | | | |---|---| | **领域** | 人道主义与发展数据 | | **观测单元** | 表格记录 | | **总行数** | 18 | | **列数** | 4(0个数值型、3个分类型、1个日期时间型) | | **训练集划分** | 14条数据 | | **测试集划分** | 3条数据 | | **地理覆盖范围** | AGO、BDI、ERI、ETH、KEN、MDG、SOM、SSD 及另外1个国家 | | **发布方** | 联合国儿童基金会东非和南部非洲区域办事处(ESARO,已停用) | | **HDX最后更新时间** | 2024-08-30 | --- ## 变量说明 **标识符/元数据** — `unnamed_1`、`esa_source`(HDX来源)、`esa_processed`(2026-04-06)。 **其他** — `instructions`(1. 需更新的工作表为各国家单独标签页(如索马里等)及“资金详情”标签页。需更新“情况”标签页以反映上下文或情境数据的变化。、南苏丹、马达加斯加)。 --- ## 快速上手 python from datasets import load_dataset ds = load_dataset("electricsheepafrica/africa-unicef-esaro-regional-db-31-october-2017") train = ds["train"].to_pandas() test = ds["test"].to_pandas() print(train.shape) train.head() --- ## 数据结构 | 列名 | 数据类型 | 缺失率 | 取值范围/示例值 | |---|---|---| | `instructions` | 对象型(object) | 5.6% | 1. 需更新的工作表为各国家单独标签页(如索马里等)及“资金详情”标签页。需更新“情况”标签页以反映上下文或情境数据的变化。、南苏丹、马达加斯加 | | `unnamed_1` | datetime64[ns] | 55.6% | | | `esa_source` | 对象型(object) | 0.0% | HDX | | `esa_processed` | 对象型(object) | 0.0% | 2026-04-06 | --- ## 数值型变量统计 | 列名 | 最小值 | 最大值 | 均值 | 中位数 | |---|---|---|---| _无数值型列。_ --- ## 数据整理流程 原始数据通过CKAN API从HDX平台下载,并转换为Parquet格式。列名统一转换为小写并标准化为蛇形命名法。通用缺失值标记(`N/A`、`null`、`none`、`-`、`unknown`、`no data`、`#N/A`)被统一替换为`NaN`。移除了1列缺失率超过80%的列:`unnamed_2`。删除了1条完全重复的数据行。根据解析成功率(阈值85%),将1列从字符串类型转换为数值型或日期时间型。使用固定随机种子(42)以80/20的比例划分为训练集与测试集,并保存为Snappy压缩的Parquet格式文件。 --- ## 局限性说明 - 数据源自联合国儿童基金会东非和南部非洲区域办事处(ESARO,已停用),未经过东非和南部非洲区域办事处的独立验证。 - 自动化清洗无法修正原始数据收集中的错报值、定义不一致或抽样偏差问题。 - 以下列缺失率超过20%,在建模时需谨慎使用:`unnamed_1`。 - 本数据集覆盖9个国家;各国间的地理与方法学差异可能影响跨国可比性。 - 如需查看发布方的方法学说明与免责声明,请参阅[原始HDX数据集页面](https://data.humdata.org/dataset/unicef-esaro-regional-db-31-october-2017)。 --- ## 引用格式 bibtex @dataset{hdx_africa_unicef_esaro_regional_db_31_october_2017, title = {UNICEF ESARO Regional db 31 March 2018}, author = {UNICEF Eastern and Southern Africa Regional Office (ESARO) (inactive)}, year = {2024}, url = {https://data.humdata.org/dataset/unicef-esaro-regional-db-31-october-2017}, note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)} } --- *[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — 非洲机器学习数据集基础设施。尼日利亚拉各斯。*
提供机构:
electricsheepafrica
二维码
社区交流群
二维码
科研交流群
商业服务