electricsheepafrica/africa-education-namibia
收藏Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-education-namibia
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-sa-4.0
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- tabular-classification
- tabular-regression
- other
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- education
- health-facilities
- transportation
- nam
pretty_name: "Namibia - Accessibility Indicators"
dataset_info:
splits:
- name: train
num_examples: 1872
- name: test
num_examples: 468
---
# Namibia - Accessibility Indicators
**Publisher:** HeiGIT (Heidelberg Institute for Geoinformation Technology) · **Source:** [HDX](https://data.humdata.org/dataset/namibia-accessibility-indicators) · **License:** `cc-by-sa` · **Updated:** 2026-02-27
---
## Abstract
This dataset provides insights into spatial accessibility to healthcare and
education services across Namibia. It has been created using free and
open tools such as [openrouteservice](https://openrouteservice.org/) and open
data sources, primarily [OpenStreetMap](https://www.openstreetmap.org/) (OSM).
To assess accessibility to education and healthcare, we use travel-time
isochrones—polygons representing areas reachable within a given time or distance by
car. We overlay these isochrones with [WorldPop](https://www.worldpop.org/) population
data, which provides 100m-resolution estimates. This allows us to calculate the
population within time intervals from 10 to 120 minutes away from hospital services and
distance intervals from 5 to 50 km away from schools. The unit of analysis is defined
by [geoboundaries](https://www.geoboundaries.org/) country borders, and where available
we also summarise results at finer administrative levels (ADM 1–4).
Data Structure:
- **name**: Region or country name.
- **iso**: ISO3 country code.
- **id**: Unique identifier for the administrative unit.
- **country**: ISO3 country code.
- **admin_level**: Administrative level of the unit.
- **category**: Service category — `education`, `hospitals` or
`primary_healthcare`.
- **range_type**: Method used for the catchment zone — `distance` or `time`.
- **range**: Distance (in meters) or Time away (in seconds) from schools used
to generate the polygon.
- **population**: Total population within the specified range.
- **school_age_population**: Number of school-age individuals within the range.
- **school_age_population_share**: Cumulative percentage of school-age
population.
- **school_age_population_interval**: Incremental school-age population added
in the current distance band.
- **school_age_population_interval_share**: Proportion of new school-age
population in the current interval.
- **population_share**: Cumulative percentage of total population.
- **population_interval**: Incremental population added in the current distance
band.
- **population_interval_share**: Share of the total population represented by
the current interval.
This dataset is one of many [HeiGIT exports on HDX](https://data.humdata.org/organization/heidelberg-institute-for-geoinformation-technology).
See the [HeiGIT](https://heigit.org/) website for more information.
We are looking forward to hearing about your use-case! Feel free to reach out
to us and tell us about your research at
[communications@heigit.org](mailto:communications@heigit.org) – we would be
happy to amplify your work.
References:
- [Geldsetzer, P., Reinmuth, M., Ouma, P. O., Lautenbach, S. et al. (2020)](https://www.thelancet.com/journals/lanhl/article/PIIS2666-7568(20)30010-6/fulltext)
- [Petricola, S., Reinmuth, M., Lautenbach, S. et al. (2022)](https://ij-healthgeographics.biomedcentral.com/articles/10.1186/s12942-022-00315-2)
- [Klipper, I. G., Zipf, A., and Lautenbach, S. (2021)](https://agile-giss.copernicus.org/articles/2/4/2021/)
- [Ruiz Sánchez, R., Reinmuth, M., Albornoz, C., Lautenbach, S., and Zipf, A. (2025)](https://agile-giss.copernicus.org/articles/6/10/2025/)
Further Information:
- [Open Access Lens](https://giscience.github.io/open-access-lens/#/)
**Limitations**:
* **OSM Completeness**: This analysis relies on OpenStreetMap (OSM) data. While OSM is
the most complete open map of the world, data quality varies significantly by region.
In areas with unmapped roads or facilities, accessibility may be underestimated.
* **Population Estimates**: Population counts are derived from WorldPop top-down
estimates (constrained). These are statistical models based on census projections and
satellite imagery, not direct census counts, and may contain inaccuracies at the local
pixel level.
* **Travel Time Assumptions**: Isochrones are calculated using standard vehicle speeds
for different road types. These models do not account for real-time traffic, seasonal
weather conditions (e.g., flooding), or road surface degradation.
* **Boundary Precision**: Administrative boundaries are sourced from geoBoundaries.
These may differ slightly from official government demarcations or other schemas.
Each row in this dataset represents country-level aggregates. Data was last updated on HDX on 2026-02-27. Geographic scope: **NAM**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Public health |
| **Unit of observation** | Country-level aggregates |
| **Rows (total)** | 2,340 |
| **Columns** | 14 (5 numeric, 9 categorical, 0 datetime) |
| **Train split** | 1,872 rows |
| **Test split** | 468 rows |
| **Geographic scope** | NAM |
| **Publisher** | HeiGIT (Heidelberg Institute for Geoinformation Technology) |
| **HDX last updated** | 2026-02-27 |
---
## Variables
**Geographic** — `country` (NAM), `admin_level` (ADM2, ADM1, ADM0), `category` (education), `range_type` (DISTANCE), `population_type` (school_age, total) and 4 others.
**Identifier / Metadata** — `name` (Luderitz, Karas, Kalahari), `id` (8085530B86358564716630, 8085530B72153402463843, 8085530B71951838310116), `esa_source` (HDX), `esa_processed` (2026-04-27).
**Other** — `range` (range 5000.0–50000.0).
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-education-namibia")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `name` | object | 0.0% | Luderitz, Karas, Kalahari |
| `id` | object | 0.0% | 8085530B86358564716630, 8085530B72153402463843, 8085530B71951838310116 |
| `country` | object | 0.0% | NAM |
| `admin_level` | object | 0.0% | ADM2, ADM1, ADM0 |
| `category` | object | 0.0% | education |
| `range_type` | object | 0.0% | DISTANCE |
| `range` | int64 | 0.0% | 5000.0 – 50000.0 (mean 27500.0) |
| `population_type` | object | 0.0% | school_age, total |
| `population` | int64 | 0.0% | 22.0 – 1900195.0 (mean 29338.6064) |
| `population_share` | float64 | 0.0% | 0.89 – 100.0 (mean 55.5838) |
| `population_interval` | int64 | 0.0% | 0.0 – 1368977.0 (mean 3451.8329) |
| `population_interval_share` | float64 | 0.0% | 0.0 – 100.0 (mean 6.7706) |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-27 |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `range` | 5000.0 | 50000.0 | 27500.0 | 27500.0 |
| `population` | 22.0 | 1900195.0 | 29338.6064 | 7310.0 |
| `population_share` | 0.89 | 100.0 | 55.5838 | 51.63 |
| `population_interval` | 0.0 | 1368977.0 | 3451.8329 | 124.0 |
| `population_interval_share` | 0.0 | 100.0 | 6.7706 | 0.87 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 1 column(s) with >80% missing values were removed: `iso`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from HeiGIT (Heidelberg Institute for Geoinformation Technology) and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/namibia-accessibility-indicators) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_education_namibia,
title = {Namibia - Accessibility Indicators},
author = {HeiGIT (Heidelberg Institute for Geoinformation Technology)},
year = {2026},
url = {https://data.humdata.org/dataset/namibia-accessibility-indicators},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica



