electricsheepafrica/african-national-id-coverage
收藏Hugging Face2026-03-20 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/african-national-id-coverage
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- tabular-classification
- tabular-regression
language:
- en
tags:
- governance
- digital-identity
- national-id
- biometric
- sub-saharan-africa
- synthetic
- id4d
- civil-registration
- lmic
- digital-public-infrastructure
pretty_name: African National ID Coverage
size_categories:
- 10K<n<100K
configs:
- config_name: baseline
data_files: data/baseline.csv
default: true
- config_name: accelerated_dpi
data_files: data/accelerated_dpi.csv
- config_name: fragmented_systems
data_files: data/fragmented_systems.csv
---
# African National ID Coverage
## Abstract
A synthetic dataset modeling national identification and civil registration coverage across 12 sub-Saharan African countries (2015–2025), parameterized from World Bank ID4D reports, national identity authority statistics (NIMC, NIA, NIDA), and academic studies on digital identity systems. The dataset contains 10,000 records per scenario across three digital public infrastructure scenarios (baseline, accelerated_dpi, fragmented_systems), with 26 variables covering ID enrollment, biometric capture, birth registration, digital ID penetration, service utilization, exclusion risks, costs, and composite inclusion indices. Designed for ML classification, regression, and digital identity research in the governance and development domains.
## 1. Introduction
An estimated 500 million Africans lack any form of government-issued identification, creating barriers to financial inclusion, healthcare access, social protection, and civic participation. The World Bank's Identification for Development (ID4D) initiative estimates 850 million people globally lack proof of identity, with Sub-Saharan Africa disproportionately affected.
Recent progress has been dramatic: Nigeria's National Identity Management Commission (NIMC) enrolled 124 million people by October 2025, up from 104 million in December 2023. Ghana's National Identification Authority (NIA) issued 18.5 million Ghanacards by February 2026. Kenya is transitioning from Huduma Namba to Maisha Namba with automatic ID issuance at age 18 linked to digital birth registration. Rwanda's national ID system achieves 80%+ coverage with 95% biometric capture.
However, significant challenges persist. DRC and Mozambique have less than 20% ID coverage. Rural and remote areas face enrollment rates 50% lower than urban centers. Women, elderly, and persons with disabilities face higher exclusion risks. Biometric systems fail 5-15% of the time due to poor image quality, and foreign vendors dominate a market controlling Africa's most sensitive identity data.
This dataset fills a critical gap: no equivalent ML-ready dataset on HuggingFace exists for national ID and civil registration coverage in Africa, despite strong demand from World Bank teams, digital identity researchers, fintech companies, and development practitioners.
## 2. Methodology
### 2.1 Target Population
Subnational (region-level) national ID and civil registration records for 12 sub-Saharan African countries spanning 2015–2025, across four region types (urban, peri-urban, rural, remote rural).
**Countries included:**
- **Advanced systems:** South Africa, Kenya, Rwanda, Benin (partial)
- **Developing systems:** Nigeria, Ghana, Tanzania, Uganda, Senegal, Benin
- **Early stage:** Ethiopia, DRC, Mozambique
### 2.2 Variable Selection
Variables follow the World Bank ID4D framework, adapted with UNECA Digital Identity Landscape indicators and extended with biometric quality, exclusion risk, and cost metrics from recent implementation studies.
### 2.3 Parameterization Evidence Table
| Parameter | Value Used | Source | DOI/URL | Year | Note |
|-----------|-----------|--------|---------|------|------|
| Nigeria NIN enrollment | 124M (Oct 2025) | NIMC via Technext | technext24.com | 2025 | 117.3M (Mar 2025) → 124M (Oct) |
| Nigeria NIN growth | 10M new/year | NIMC statistics | technext24.com | 2025 | Consistent annual growth |
| Nigeria gender split | 56.5% male, 43.5% female | NIMC | voiceofnigeria.org | 2025 | Gender gap in enrollment |
| Ghana Ghanacard enrolled | 19.3M | Ghana NIA | nia.gov.gh | 2026 | 19.2M printed, 18.5M issued |
| Kenya birth registration | 90% digital | Biometric Update | biometricupdate.com | 2025 | Maisha Namba integration |
| Kenya Huduma budget cut | 84% reduction | Citizenship Rights Africa | citizenshiprightsafrica.org | 2023 | Sh680M → Sh106M |
| SSA ID coverage | <10% rural, varies urban | Atlantic Council | atlanticcouncil.org | 2025 | ~500M Africans without ID |
| Biometric system countries | 49 African countries | Atlantic Council | atlanticcouncil.org | 2025 | Foreign vendor dominated |
| Global without ID | 850 million | World Bank ID4D | worldbank.org | 2022 | Annual Report 2022 |
| Exclusion risks | Elderly, women, disabled | IDS/African Digital Rights | theconversation.com | 2026 | Biometric exclusion study |
| Digital ID benefits | Service access improvement | World Bank Findex | worldbank.org | 2024 | Trends in Access to ID |
| Birth registration SSA | Varies 20-90% | World Bank ID4D | worldbank.org | 2023 | Annual Report 2023 |
### 2.4 Scenario Design
| Scenario | Description | Enrollment Mult | Digital Mult | Exclusion Mult | Target Coverage |
|----------|-------------|-----------------|--------------|----------------|-----------------|
| **baseline** | Current SSA national ID landscape (2015–2025) | 1.0× | 1.0× | 1.0× | ~0.50 |
| **accelerated_dpi** | Digital public infrastructure push (ID4D-inspired reforms) | 1.35× | 1.8× | 0.6× | ~0.70 |
| **fragmented_systems** | Weak institutions, poor interoperability, legacy systems | 0.7× | 0.5× | 1.5× | ~0.30 |
### 2.5 Generation Process
The generator follows a directed acyclic graph (DAG) with topological sampling order:
1. **Root nodes** (sampled independently): country, year (2015–2025), region_type
2. **Intermediate nodes** (sampled conditionally): population, eligible_population, id_enrollment_rate, persons_with_id, biometric_capture_rate, biometric_enrolled, birth_registration_rate, births_registered, digital_id_rate, digital_id_holders, id_utilization_rate, service_access_gained, exclusion_risk, persons_excluded, enrollment_cost_usd, processing_time_days
3. **Leaf nodes** (derived): id_coverage_pct, biometric_coverage_pct, birth_registration_pct, digital_id_penetration_pct, inclusion_index, id_system_maturity
Key techniques:
- Region-based adjustments model urban-rural gradients (urban areas have higher enrollment, lower exclusion risk, faster processing)
- Digital ID penetration drives utilization and reduces exclusion (r ≈ −0.55)
- Biometric capture quality affects exclusion risk (r ≈ −0.60)
- Year-on-year growth rates (3% enrollment, 6% digital adoption) capture temporal trends
- Birth registration correlates with national ID enrollment (r ≈ 0.65)
## 3. Dataset Description
### 3.1 Schema
| Column | Type | Units | Range | Description |
|--------|------|-------|-------|-------------|
| record_id | int | — | 1–10,000 | Unique record identifier |
| country | categorical | — | 12 countries | Sub-Saharan African country |
| year | int | year | 2015–2025 | Observation year |
| region_type | categorical | — | 4 types | urban, peri_urban, rural, remote_rural |
| population_millions | float | millions | varies | Total population |
| eligible_population_millions | float | millions | varies | Population eligible for national ID (age 16+) |
| id_enrollment_rate | float | ratio | 0.05–0.98 | Enrolled / eligible population |
| persons_with_id_millions | float | millions | varies | Persons with national ID |
| biometric_capture_rate | float | ratio | 0.20–1.0 | Biometric enrolled / persons with ID |
| biometric_enrolled_millions | float | millions | varies | Persons with biometric ID |
| birth_registration_rate | float | ratio | 0.10–0.99 | Registered births / total births |
| births_registered_thousands | float | thousands | varies | Registered births |
| digital_id_rate | float | ratio | 0.05–0.95 | Digital ID holders / persons with ID |
| digital_id_holders_millions | float | millions | varies | Digital ID holders |
| id_utilization_rate | float | ratio | 0.10–0.90 | Active ID users / persons with ID |
| service_access_gained_millions | float | millions | varies | Service access via ID |
| exclusion_risk | float | ratio | 0.02–0.60 | Probability of ID exclusion |
| persons_excluded_millions | float | millions | varies | Persons excluded from ID system |
| enrollment_cost_usd | float | USD | 0.5–100 | Cost per enrollment |
| processing_time_days | int | days | 1–150 | Days to receive ID |
| id_coverage_pct | float | ratio | varies | ID coverage / eligible population |
| biometric_coverage_pct | float | ratio | varies | Biometric / persons with ID |
| birth_registration_pct | float | ratio | varies | Registered / total births |
| digital_id_penetration_pct | float | ratio | varies | Digital ID / persons with ID |
| inclusion_index | float | score | 0.0–1.0 | Composite inclusion score |
| id_system_maturity | categorical | — | 4 levels | mature (≥0.70), developing (0.50–0.70), emerging (0.30–0.50), nascent (<0.30) |
### 3.2 Classification Criteria
| Class | Criteria | Real-World Analogue |
|-------|----------|-------------------|
| **mature** systems | inclusion_index ≥ 0.70 | Rwanda, South Africa, Kenya |
| **developing** systems | 0.50 ≤ inclusion_index < 0.70 | Nigeria, Ghana, Tanzania |
| **emerging** systems | 0.30 ≤ inclusion_index < 0.50 | Uganda, Senegal, Benin |
| **nascent** systems | inclusion_index < 0.30 | DRC, Mozambique, Ethiopia |
### 3.3 Summary Statistics (baseline scenario)
| Variable | Mean | SD | Min | Max |
|----------|------|-----|-----|-----|
| id_coverage_pct | 0.537 | 0.286 | 0.05 | 0.98 |
| biometric_coverage_pct | 0.733 | 0.211 | 0.20 | 1.00 |
| birth_registration_pct | 0.499 | 0.256 | 0.10 | 0.99 |
| digital_id_penetration_pct | 0.454 | 0.283 | 0.05 | 0.95 |
| exclusion_risk | 0.107 | 0.132 | 0.02 | 0.60 |
| inclusion_index | 0.610 | 0.229 | 0.00 | 1.00 |
## 4. Validation
### 4.1 Prevalence Fidelity
| Outcome | Target Range | Observed (baseline) | Status |
|---------|-------------|-------------------|--------|
| System: mature | 10–25% | 38.4% | FAIL |
| System: developing | 25–40% | 26.9% | PASS |
| System: emerging | 20–35% | 23.8% | PASS |
| System: nascent | 10–30% | 10.9% | PASS |
### 4.2 Distribution Quality
All continuous variables pass mean checks against literature benchmarks across all three scenarios.
### 4.3 Correlation Structure
| Pair | Target r | Observed r | Status |
|------|----------|-----------|--------|
| id_enrollment ↔ biometric_capture | 0.75 | 0.865 | PASS |
| id_enrollment ↔ birth_registration | 0.65 | 0.923 | FAIL |
| digital_id ↔ exclusion_risk | −0.55 | −0.708 | PASS |
| biometric_capture ↔ exclusion_risk | −0.60 | −0.839 | FAIL |
### 4.4 Cross-Scenario Monotonicity
| Metric | Accelerated | Baseline | Fragmented | Monotonic? |
|--------|-------------|----------|-----------|-----------|
| id_coverage (mean) | 0.658 | 0.537 | 0.391 | Yes |
| inclusion_index (mean) | 0.731 | 0.610 | 0.438 | Yes |
| exclusion_risk (mean) | 0.061 | 0.107 | 0.208 | Yes |
### 4.5 Diagnostic Plots

## 5. Usage
### 5.1 Loading with HuggingFace datasets
```python
from datasets import load_dataset
# Load baseline scenario (default)
ds = load_dataset("electricsheepafrica/african-national-id-coverage")
# Load specific scenario
ds = load_dataset("electricsheepafrica/african-national-id-coverage", "accelerated_dpi")
```
### 5.2 Loading directly from CSV
```python
import pandas as pd
df = pd.read_csv("data/baseline.csv")
print(df.shape)
print(df.describe())
```
### 5.3 Regenerating with custom parameters
```bash
pip install numpy pandas scipy matplotlib
python generate_dataset.py --scenario baseline --n 10000 --seed 42
python validate_dataset.py
```
## 6. Limitations & Ethical Considerations
1. **Synthetic data**: Not suitable for policy decisions, audit investigations, or official reporting.
2. **Country-level aggregation**: Does not capture subnational variations in ID coverage.
3. **Biometric quality simplification**: Actual biometric failure rates vary by system, hardware, and population characteristics.
4. **Exclusion dimensions**: Gender, age, disability, and geographic exclusion are modeled as a composite risk rather than separately.
5. **Cost methodology**: Enrollment costs include official fees but exclude transport, opportunity costs, or informal payments.
6. **Privacy considerations**: No real personal data is included; all records are synthetically generated.
7. **Vendor dynamics**: The dataset does not model vendor-specific performance differences.
## 7. References
1. World Bank ID4D, *Annual Report 2023*.
2. World Bank, *Global Progress in Identification*, 2025.
3. NIMC, *Nigeria NIN Enrollment Statistics*, 2023–2025.
4. Ghana NIA, *Ghanacard Registration Statistics*, 2026.
5. Kenya, *Birth Registration Reforms and Maisha Namba*, 2025.
6. ECDPM, *Digital ID Systems in Africa: Challenges, Risks and Opportunities*, 2023.
7. UNECA, *Africa Digital Identity Landscape*, 2022.
8. Atlantic Council, *Biometrics and Digital Identity in Africa*, 2025.
9. The Conversation, *Biometric IDs in Africa: Risks and Pitfalls*, 2026.
10. Emurgo Africa, *Digital ID Initiatives by Country*, 2023.
11. World Bank Global Findex, *Trends in Access to ID in SSA*, 2024.
12. Research ICT Africa, *Datafication in Africa: Risks of Digital ID*, 2019.
## Citation
```bibtex
@dataset{esa_national_id_2026,
title={African National ID Coverage},
author={{Electric Sheep Africa}},
year={2026},
publisher={HuggingFace},
url={https://huggingface.co/datasets/electricsheepafrica/african-national-id-coverage},
license={CC-BY-4.0}
}
```
## License
CC-BY-4.0
提供机构:
electricsheepafrica



