electricsheepafrica/africa-mozambique-school-data
收藏Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-mozambique-school-data
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- 10K<n<100K
source_datasets:
- original
task_categories:
- other
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- education
- education-facilities-schools
- geodata
- moz
pretty_name: "Mozambique: Schools"
dataset_info:
splits:
- name: train
num_examples: 10408
- name: test
num_examples: 2602
---
# Mozambique: Schools
**Publisher:** OCHA Mozambique · **Source:** [HDX](https://data.humdata.org/dataset/mozambique-school-data) · **License:** `cc-by` · **Updated:** 2025-05-05
---
## Abstract
Mozambique school data at lowest administrative level available.
Each row in this dataset represents geolocated point observations. Data was last updated on HDX on 2025-05-05. Geographic scope: **MOZ**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Education |
| **Unit of observation** | Geolocated point observations |
| **Rows (total)** | 13,010 |
| **Columns** | 27 (6 numeric, 21 categorical, 0 datetime) |
| **Train split** | 10,408 rows |
| **Test split** | 2,602 rows |
| **Geographic scope** | MOZ |
| **Publisher** | OCHA Mozambique |
| **HDX last updated** | 2025-05-05 |
---
## Variables
**Geographic** — `latrina`, `totallatri` (range 0.0–58.0), `longitude` (range 30.2394–40.8219), `latitude` (range -26.8445–-10.5675).
**Outcome / Measurement** — `totalurino` (range 0.0–30.0), `totalcb` (range 0.0–31.0).
**Identifier / Metadata** — `localidade` (Cidade de Mocuba, Beira, Sede), `esa_source`, `esa_processed`.
**Other** — `provincia` (Zambézia, Nampula, Tete), `distrito` (Milange, Gurué, Mocuba), `posto` (Milange sede, Nauela, Gurué Sede), `povoado` (Nenhum, Cidade de Nampula, Cidade de Mocuba), `cod_escola` (range 8104.0–620414.0) and 13 others.
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-mozambique-school-data")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `provincia` | object | 0.0% | Zambézia, Nampula, Tete |
| `distrito` | object | 0.0% | Milange, Gurué, Mocuba |
| `posto` | object | 0.0% | Milange sede, Nauela, Gurué Sede |
| `localidade` | object | 0.0% | Cidade de Mocuba, Beira, Sede |
| `povoado` | object | 0.1% | Nenhum, Cidade de Nampula, Cidade de Mocuba |
| `cod_escola` | float64 | 13.9% | 8104.0 – 620414.0 (mean 67206.6694) |
| `nomeesco_1` | object | 0.0% | Escola Primária Completa Eduardo Mondlane, Escola Primária Samora Machel, Escola Primária Completa 25 de Junho |
| `zip` | object | 0.5% | Nenhum, Quême, Nihesiue |
| `agua` | object | 0.0% | Não, Sim |
| `tipofonte` | object | 0.0% | Sim, Nenhuma Fonte, Furo com bomba |
| `qualiagua` | object | 0.0% | Nenhuma, Potável, Não Potável |
| `energia` | object | 0.0% | |
| `tipoenergi` | object | 0.0% | |
| `latrina` | object | 0.0% | |
| `totallatri` | float64 | 0.0% | 0.0 – 58.0 (mean 2.2881) |
| `estado_de_conservação3` | object | 0.0% | |
| `urinois` | object | 0.0% | |
| `totalurino` | float64 | 0.0% | 0.0 – 30.0 (mean 1.1347) |
| `urinoisfun` | object | 0.0% | |
| `estado_de_conservação4` | object | 0.0% | |
| `casabanho` | object | 0.0% | |
| `totalcb` | float64 | 0.0% | 0.0 – 31.0 (mean 1.1361) |
| `cbfunci` | object | 0.0% | |
| `longitude` | float64 | 0.1% | 30.2394 – 40.8219 (mean 36.1759) |
| `latitude` | float64 | 0.1% | -26.8445 – -10.5675 (mean -17.6031) |
| `esa_source` | object | 0.0% | |
| `esa_processed` | object | 0.0% | |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `cod_escola` | 8104.0 | 620414.0 | 67206.6694 | 70135.0 |
| `totallatri` | 0.0 | 58.0 | 2.2881 | 2.0 |
| `totalurino` | 0.0 | 30.0 | 1.1347 | 0.0 |
| `totalcb` | 0.0 | 31.0 | 1.1361 | 0.0 |
| `longitude` | 30.2394 | 40.8219 | 36.1759 | 35.9839 |
| `latitude` | -26.8445 | -10.5675 | -17.6031 | -16.3178 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 12 exact duplicate rows were removed. 2 column(s) were cast from string to numeric or datetime based on parse-success rate (>85% threshold). The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from OCHA Mozambique and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/mozambique-school-data) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_mozambique_school_data,
title = {Mozambique: Schools},
author = {OCHA Mozambique},
year = {2025},
url = {https://data.humdata.org/dataset/mozambique-school-data},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica



