electricsheepafrica/african-internet-penetration-district
收藏Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/african-internet-penetration-district
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
tags:
- digital
- internet
- connectivity
- sub-saharan-africa
- synthetic
- telecom
- infrastructure
language:
- en
pretty_name: African Internet Penetration District Dataset
size_categories:
- 10K<n<100K
task_categories:
- tabular-regression
- tabular-classification
---
# African Internet Penetration District Dataset
Synthetic district-level internet connectivity data for 12 Sub-Saharan African countries across three policy scenarios (2018–2025).
## Dataset Description
This dataset provides granular, district-level estimates of internet penetration and digital infrastructure metrics across Sub-Saharan Africa. It is fully synthetic, generated with realistic statistical distributions calibrated against public data from ITU, GSMA, World Bank, and national regulatory authorities.
### Countries
| Country | Baseline Avg Internet Pen |
|---------|--------------------------|
| South Africa | ~70% |
| Nigeria | ~55% |
| Kenya | ~65% |
| DR Congo | ~25% |
| Ghana | ~53% |
| Tanzania | ~40% |
| Ethiopia | ~20% |
| Uganda | ~35% |
| Rwanda | ~45% |
| Senegal | ~42% |
| Cote d'Ivoire | ~38% |
| Mozambique | ~15% |
### Scenarios
- **baseline**: Business-as-usual growth trajectory
- **infrastructure_push**: Aggressive infrastructure investment (+12pp internet, +40% speed, -30% cost)
- **digital_divide**: Widening digital gap (-8pp internet, -30% speed, +30% cost)
### Variables
| Variable | Type | Description |
|----------|------|-------------|
| `record_id` | int | Unique record identifier |
| `country` | str | Country name |
| `district` | str | Province/state/district name |
| `year` | int | Year (2018–2025) |
| `population` | int | District population estimate |
| `internet_penetration_pct` | float | Internet penetration rate (%) |
| `mobile_broadband_pct` | float | Mobile broadband subscription rate (%) |
| `fixed_broadband_pct` | float | Fixed broadband subscription rate (%) |
| `urban_rural` | str | Urban or rural classification |
| `avg_download_speed_mbps` | float | Average download speed (Mbps) |
| `data_cost_per_gb_usd` | float | Mobile data cost per GB (USD) |
| `smartphone_penetration_pct` | float | Smartphone ownership rate (%) |
| `2g_coverage_pct` | float | 2G network coverage (%) |
| `3g_coverage_pct` | float | 3G network coverage (%) |
| `4g_coverage_pct` | float | 4G/LTE network coverage (%) |
| `5g_coverage_pct` | float | 5G network coverage (%) |
| `digital_literacy_score` | float | Digital literacy composite score (0–100) |
| `affordability_index` | float | Affordability index (0–100) |
| `connectivity_gap_index` | float | Connectivity gap index (0–100, higher = worse) |
| `scenario` | str | Policy scenario label |
## Files
- `data/african_internet_penetration_district.csv` — Combined dataset (all scenarios)
- `data/african_internet_penetration_baseline.csv` — Baseline scenario
- `data/african_internet_penetration_infrastructure_push.csv` — Infrastructure push scenario
- `data/african_internet_penetration_digital_divide.csv` — Digital divide scenario
- `data/african_internet_penetration_district.parquet` — Parquet format (all scenarios)
- `generate_dataset.py` — Generation script
- `validate_dataset.py` — Validation script
## Usage
```python
import pandas as pd
df = pd.read_csv("data/african_internet_penetration_district.csv")
print(df.describe())
# Filter by country and scenario
sa_baseline = df[(df["country"] == "South Africa") & (df["scenario"] == "baseline")]
```
## Generation
```bash
pip install -r requirements.txt
python generate_dataset.py
python validate_dataset.py
```
## License
Creative Commons Attribution 4.0 International (CC-BY-4.0).
提供机构:
electricsheepafrica



