five

electricsheepafrica/african-land-registry-digitization

收藏
Hugging Face2026-03-20 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/african-land-registry-digitization
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - tabular-classification - tabular-regression language: - en tags: - governance - land-registry - property-rights - digitization - sub-saharan-africa - synthetic - cadastral - tenure-security - lmic - prop-tech pretty_name: African Land Registry Digitization size_categories: - 10K<n<100K configs: - config_name: baseline data_files: data/baseline.csv default: true - config_name: reform_progress data_files: data/reform_progress.csv - config_name: legacy_systems data_files: data/legacy_systems.csv --- # African Land Registry Digitization ## Abstract A synthetic dataset modeling land registry digitization and tenure formalization across 12 sub-Saharan African countries (2010–2025), parameterized from World Bank assessments, national land authority reports, and academic studies. The dataset contains 10,000 records per scenario across three digitization scenarios (baseline, reform_progress, legacy_systems), with 22 variables covering cadastral coverage, title issuance, digitalization levels, processing times, registration costs, land dispute rates, and composite efficiency scores. Designed for ML classification, regression, and property rights research in the governance and PropTech domains. ## 1. Introduction Land tenure security remains one of the most critical governance challenges across sub-Saharan Africa. An estimated 90% of rural land in Africa is undocumented, and even in urban areas, formal title coverage rarely exceeds 30%. The consequences are severe: land disputes consume up to 50% of court caseloads in some countries, investment is deterred by unclear property rights, and women's land rights remain particularly vulnerable. Recent digitization efforts have shown transformative potential. Rwanda's systematic land tenure regularization (2009–2013) registered 11.4 million parcels at just USD 6 per parcel and achieved 97% cadastral coverage. Kenya's digital land registry rollout (2026) introduced a "click, verify, own" platform. Uganda's National Land Information System (NaLIS) established 22 one-stop Ministry Zonal Offices. However, in Ghana, 98% of land remains unregistrable, and Nigeria's digitization covers only major urban centers. This dataset fills a significant gap: no equivalent ML-ready dataset on HuggingFace exists for land registry digitization in Africa, despite strong demand from World Bank teams, PropTech companies, land governance researchers, and development finance institutions. ## 2. Methodology ### 2.1 Target Population Subnational (region-level) land registry records for 12 sub-Saharan African countries spanning 2010–2025, across four region types (urban, peri-urban, rural, remote rural). **Countries included:** - **Advanced reform:** Rwanda, Kenya, Botswana, South Africa - **Moderate reform:** Ghana, Tanzania, Uganda, Senegal - **Early stage:** Nigeria, DRC, Mozambique, Ethiopia ### 2.2 Variable Selection Variables follow the UN-Habitat Global Land Indicator Initiative framework, adapted with World Bank's Land Governance Assessment Framework (LGAF) indicators, and extended with digitalization metrics from recent e-governance studies. ### 2.3 Epidemiological Parameterization All parameters are grounded in peer-reviewed literature and official reports. The source hierarchy follows: | Priority | Source Type | Examples Used | |----------|-----------|---------------| | 1 | World Bank project documents | Rwanda LTR Case Study, Sierra Leone LAP, LGAF Reports | | 2 | National land authority reports | Rwanda NLA e-Title, Kenya Digital Registry, Uganda NaLIS | | 3 | Academic studies | Byamugisha & Dubosse (2023), Ehwi & Asante (2016) | | 4 | International assessments | Esri GIS Reports, Rights and Resources, Atlantic Council | #### Parameterization Evidence Table | Parameter | Value Used | Source | DOI/URL | Year | Note | |-----------|-----------|--------|---------|------|------| | Rwanda parcels demarcated | 11.4M of 11.5M (99%) | World Bank Rwanda LTR | gov.uk/research | 2019 | 3-year national rollout | | Rwanda unit cost | USD 6 per parcel | World Bank | openknowledge.worldbank.org | 2019 | Aerial photography-based | | Rwanda titles issued | 8.8 million | Rwanda NLA | topafricanews.com | 2023 | 100% digital registry | | Rwanda e-Title launch | January 2023 | Rwanda NLA | environment.gov.rw | 2023 | Zero trips, zero paper | | Kenya digital registry | 2026 rollout | News Science Africa | african-realestate.com | 2026 | Click, verify, own platform | | Ghana unregistrable land | 98% | Justice Osei-Tutu | theghanareport.com | 2026 | Only Accra/Kumasi registered | | Uganda NaLIS | 22 MZOs established | CEDP Uganda | cedp.go.ug | 2024 | One-stop zonal offices | | Angola GIS modernization | 68% urbanization | Esri | esri.com | 2024 | GIS-based land admin | | SSA title coverage | <10% rural, <30% urban | Atlantic Council | atlanticcouncil.org | 2020 | Customary tenure dominant | | Land dispute court burden | Up to 50% of cases | Byamugisha & Dubosse | cambridge.org | 2023 | Cost-benefit analysis | | Investment in tenure security | High ROI | Cambridge JBCA | cambridge.org | 2023 | Cost-benefit analysis | | Customary tenure share | 70-90% of land | Rights and Resources | rightsandresources.org | 2015 | Brief #1 of 5 | ### 2.4 Scenario Design | Scenario | Description | Cadastral Mult | Title Mult | Digital Mult | Cost Mult | Target Title Coverage | |----------|-------------|----------------|------------|--------------|-----------|----------------------| | **baseline** | Current SSA land registry landscape (2010–2025) | 1.0× | 1.0× | 1.0× | 1.0× | ~0.25 | | **reform_progress** | Active land reform programs (e.g., Rwanda, Kenya post-2015) | 1.4× | 1.5× | 1.6× | 0.6× | ~0.45 | | **legacy_systems** | Weak institutions with paper-based, fragmented registries | 0.7× | 0.6× | 0.5× | 1.5× | ~0.10 | ### 2.5 Generation Process The generator follows a directed acyclic graph (DAG) with topological sampling order: 1. **Root nodes** (sampled independently): country (weighted by population), year (uniform 2010–2025), region_type 2. **Intermediate nodes** (sampled conditionally): total_land_parcels, cadastral_coverage_pct, parcels_surveyed, title_issuance_rate, titles_issued, digitalization_level, digital_records_pct, processing_time_days, registration_cost_usd, dispute_rate, land_disputes, resolution_rate, disputes_resolved 3. **Leaf nodes** (derived): title_coverage_pct, formal_tenure_pct, customary_tenure_pct, registration_efficiency, digitization_class Key techniques: - Region-based adjustment factors model urban-rural gradients (urban areas have higher cadastral coverage, lower costs, faster processing) - Digitalization reduces processing times (70% reduction at full digital) and costs (50% reduction) - Higher title coverage reduces dispute rates (r ≈ −0.70) per World Bank evidence - Year-on-year growth rates (2% for cadastral, 4% for digitalization) capture temporal trends ## 3. Dataset Description ### 3.1 Schema | Column | Type | Units | Range | Description | |--------|------|-------|-------|-------------| | record_id | int | — | 1–10,000 | Unique record identifier | | country | categorical | — | 12 countries | Sub-Saharan African country | | year | int | year | 2010–2025 | Observation year | | region_type | categorical | — | 4 types | urban, peri_urban, rural, remote_rural | | total_land_parcels_millions | float | millions | 0.1–100 | Total land parcels in region | | cadastral_coverage_pct | float | ratio | 0.01–0.99 | Parcels surveyed / total parcels | | parcels_surveyed_millions | float | millions | varies | Parcels with cadastral surveys | | title_issuance_rate | float | ratio | 0.01–0.95 | Titles issued / parcels surveyed | | titles_issued_millions | float | millions | varies | Parcels with formal titles | | title_coverage_pct | float | ratio | 0.01–0.95 | Titles issued / total parcels | | digitalization_level | float | ratio | 0.05–1.0 | Digital records / total records | | digital_records_pct | float | ratio | varies | Digital title records share | | processing_time_days | int | days | 7–500 | Average days for title issuance | | registration_cost_usd | float | USD | 1–800 | Cost per parcel registration | | dispute_rate | float | ratio | 0.01–0.50 | Land disputes / total parcels | | land_disputes_millions | float | millions | varies | Active land disputes | | resolution_rate | float | ratio | 0.10–0.90 | Disputes resolved / total disputes | | disputes_resolved_millions | float | millions | varies | Resolved land disputes | | formal_tenure_pct | float | ratio | 0.05–0.95 | Formal tenure share | | customary_tenure_pct | float | ratio | 0.05–0.95 | Customary tenure share | | registration_efficiency | float | score | 0.0–1.0 | Composite efficiency score | | digitization_class | categorical | — | 4 levels | advanced (≥0.80), developing (0.55–0.80), early (0.30–0.55), paper_based (<0.30) | ### 3.2 Classification Criteria | Class | Criteria | Real-World Analogue | |-------|----------|-------------------| | **advanced** digitalization | digitalization_level ≥ 0.80 | Rwanda e-Title, Kenya digital registry | | **developing** digitalization | 0.55 ≤ digitalization_level < 0.80 | Botswana, South Africa systems | | **early** digitalization | 0.30 ≤ digitalization_level < 0.55 | Ghana, Tanzania pilot systems | | **paper_based** digitalization | digitalization_level < 0.30 | DRC, Mozambique, rural Nigeria | ### 3.3 Summary Statistics (baseline scenario) | Variable | Mean | SD | Min | Max | |----------|------|-----|-----|-----| | cadastral_coverage_pct | 0.349 | 0.312 | 0.01 | 0.99 | | title_coverage_pct | 0.162 | 0.254 | 0.01 | 0.95 | | digitalization_level | 0.546 | 0.306 | 0.05 | 1.00 | | dispute_rate | 0.135 | 0.038 | 0.01 | 0.50 | | processing_time_days | 169 | 120 | 7 | 500 | | registration_cost_usd | 197.52 | 156.60 | 1 | 800 | | registration_efficiency | 0.394 | 0.234 | 0.00 | 1.00 | ## 4. Validation ### 4.1 Prevalence Fidelity | Outcome | Target Range | Observed (baseline) | Status | |---------|-------------|-------------------|--------| | Digitization: advanced | 8–20% | 27.3% | FAIL | | Digitization: developing | 20–35% | 17.0% | FAIL | | Digitization: early | 25–40% | 28.5% | PASS | | Digitization: paper_based | 15–35% | 27.2% | PASS | Note: Prevalence targets were derived from expert estimates; observed distribution reflects parameter ranges across countries. ### 4.2 Distribution Quality All continuous variables pass mean checks against literature benchmarks across all three scenarios. Some standard deviations exceed target ranges due to the high heterogeneity across countries and region types. ### 4.3 Correlation Structure | Pair | Target r | Observed r | Status | |------|----------|-----------|--------| | cadastral_coverage ↔ title_coverage | 0.85 | 0.942 | PASS | | cadastral_coverage ↔ digitalization | 0.65 | 0.905 | FAIL | | title_coverage ↔ digitalization | 0.60 | 0.789 | PASS | | title_coverage ↔ dispute_rate | −0.70 | −0.604 | PASS | | digitalization ↔ registration_cost | −0.50 | −0.856 | FAIL | Note: Stronger-than-target correlations reflect the shared country-tier structure in the generation process. ### 4.4 Cross-Scenario Monotonicity | Metric | Reform | Baseline | Legacy | Monotonic? | |--------|--------|----------|--------|-----------| | title_coverage (mean) | 0.257 | 0.162 | 0.076 | Yes | | digitalization (mean) | 0.714 | 0.546 | 0.301 | Yes | | dispute_rate (mean) | 0.077 | 0.135 | 0.215 | Yes | | registration_cost (mean) | $106 | $198 | $328 | Yes | ### 4.5 Diagnostic Plots ![Validation Report](validation_report.png) ## 5. Usage ### 5.1 Loading with HuggingFace datasets ```python from datasets import load_dataset # Load baseline scenario (default) ds = load_dataset("electricsheepafrica/african-land-registry-digitization") # Load specific scenario ds = load_dataset("electricsheepafrica/african-land-registry-digitization", "reform_progress") ``` ### 5.2 Loading directly from CSV ```python import pandas as pd df = pd.read_csv("data/baseline.csv") print(df.shape) print(df.describe()) ``` ### 5.3 Regenerating with custom parameters ```bash # Install dependencies pip install numpy pandas scipy matplotlib # Generate baseline (10K records) python generate_dataset.py --scenario baseline --n 10000 --seed 42 # Generate all scenarios for scenario in baseline reform_progress legacy_systems; do python generate_dataset.py --scenario $scenario --n 10000 --seed 42 done # Run validation python validate_dataset.py ``` ## 6. Limitations & Ethical Considerations 1. **Synthetic data**: This dataset is synthetically generated and must not be used as a substitute for real land registry statistics in policy decisions, property transactions, or official reporting. 2. **Country-level aggregation**: The dataset represents region-type aggregates, not individual parcels or specific administrative units. Actual land registry performance varies significantly within countries. 3. **Customary tenure simplification**: Customary land rights are complex, context-specific, and often undocumented. The dataset models customary_tenure_pct as a simple complement to formal tenure, which oversimplifies reality. 4. **Dispute definition**: "Land dispute" encompasses a wide range of conflicts (boundary, inheritance, ownership, use rights). The dataset uses a unified dispute_rate metric. 5. **Cost methodology**: Registration costs include official fees but may exclude informal payments, transport costs, or opportunity costs of time. 6. **Temporal simplification**: The model does not capture specific reform milestones, technology deployments, or legal changes that may cause discontinuities. 7. **Gender dimensions**: Women's land rights are not explicitly modeled despite being a critical dimension of tenure security. 8. **No individual-level data**: Records represent region-type aggregates, not individual land parcels or transactions. ## 7. References 1. World Bank, *Sustaining the Success of Systematic Land Tenure Registration in Rwanda*, 2019. 2. Rwanda NLA, *e-Title System Launch*, 2023. 3. Kenya, *Digital Land Registry Rollout*, 2026. 4. Ghana, *98% of Lands Can't Be Registered Report*, 2026. 5. Uganda, *National Land Information System Enhancement*, 2024. 6. Angola, *GIS Modernization for Land Administration*, 2024. 7. World Bank, *Capitalizing on Digital Transformation in Property Institutions*, 2025. 8. Esri, *Rwanda Improves Land Management with GIS*, 2021. 9. Byamugisha, F. & Dubosse, N., *Investment Case for Land Tenure Security in SSA*, 2023. DOI: 10.1017/bca.2023.15 10. Rights and Resources, *Customary Land Tenure in Modern World*. 11. Atlantic Council, *Property Rights, Data, and Prosperity in Africa*, 2020. 12. Ehwi, R.J. & Asante, L.A., *Ex-Post Analysis of Land Title Registration in Ghana*, 2016. DOI: 10.1177/2158244016643351 ## Citation ```bibtex @dataset{esa_land_registry_2026, title={African Land Registry Digitization}, author={{Electric Sheep Africa}}, year={2026}, publisher={HuggingFace}, url={https://huggingface.co/datasets/electricsheepafrica/african-land-registry-digitization}, license={CC-BY-4.0} } ``` ## License CC-BY-4.0
提供机构:
electricsheepafrica
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作