electricsheepafrica/african-civil-service-capacity

Name: electricsheepafrica/african-civil-service-capacity
Creator: electricsheepafrica
Published: 2026-03-20 23:54:23
License: 暂无描述

Hugging Face2026-03-20 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/electricsheepafrica/african-civil-service-capacity

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - tabular-classification - tabular-regression language: - en tags: - governance - civil-service - public-administration - sub-saharan-africa - synthetic - lmic pretty_name: African Civil Service Capacity size_categories: - 10K<n<100K configs: - config_name: baseline data_files: - split: train path: data/baseline.csv - config_name: reform_modernized data_files: - split: train path: data/reform_modernized.csv - config_name: underresourced data_files: - split: train path: data/underresourced.csv --- # African Civil Service Capacity ## Abstract This dataset contains 30,000 synthetic records (10,000 per scenario) describing civil-service staffing, qualifications, training, retention, and performance across 12 Sub-Saharan African countries. It is designed for tabular classification (capacity class: critical / low / moderate / high) and regression (capacity score, vacancy rate, digital literacy rate). Three counterfactual scenarios — **baseline**, **reform_modernized**, and **underresourced** — enable policy simulation of civil-service reform outcomes. ## Introduction Civil service capacity is a primary determinant of public-sector delivery quality across Sub-Saharan Africa (SSA). Across the continent, public employees constitute fewer than 12% of total employment [Mo Ibrahim Foundation, 2018], vacancy rates range from 12% to 40% depending on country and region, and qualification gaps remain substantial — only 9% of African youth possess tertiary education [Mastercard Foundation, 2026]. Country-level evidence reveals sharp variation. Kenya's Public Service Commission reported 113,340 vacancies (36.5%) against 310,735 approved posts in 2025, alongside 1,019 fake certificates detected and just 46.6% participation in continuous professional development [Kenya PSC, 2025]. South Africa's vacancy rate stands at 19.1% with a public-sector wage bill consuming 10.4% of GDP [SA PSC, 2024; SA National Treasury MTBPS, 2024]. Uganda employs 366,574 public servants for 45.9 million citizens — roughly 80 per 10,000 [Uganda MoPS, 2023]. Botswana and Mauritius rank as top performers on public-service quality indices, while Chad scores lowest at 28.75% on service delivery [Mo Ibrahim Foundation, 2018]. This dataset provides a parameterised, reproducible synthetic resource for training ML models on African governance capacity without exposing sensitive personnel data. ## Methodology ### Parameterization Country parameters are anchored to published administrative statistics. The table below maps each parameter to its evidence source. | Parameter | Country | Value | Source | |---|---|---|---| | Vacancy rate | South Africa | 19.1% | SA PSC, Public Service Reforms Report, 2024 | | Wage bill (% GDP) | South Africa | 10.4% | SA National Treasury, MTBPS Compensation & Employment Data, 2024 | | Public servants | Uganda | 366,574 (80 per 10k) | Uganda MoPS, State of HR Report, 2023 | | Population (2023/24) | Uganda | 45.9M | Uganda MoPS, 2023 | | Public employment share | SSA average | <12% of total employment | Mo Ibrahim Foundation, Public Service in Africa, 2018 | | Service delivery score | Chad (lowest) | 28.75% | Mo Ibrahim Foundation, 2018 | | Top performers | Botswana, Mauritius | Ranked highest | Mo Ibrahim Foundation, 2018 | | Vacancies | Kenya | 113,340 (36.5% of 310,735 approved posts) | Kenya PSC, Annual Report, 2025 | | Fake certificates | Kenya | 1,019 found | Kenya PSC, 2025 | | CPD participation | Kenya | 46.6% | Kenya PSC, 2025 | | Tertiary education rate | SSA youth | 9% | Mastercard Foundation, 2026 | | Africa development dynamics | Pan-African | Employment & governance metrics | OECD/AUC, Africa's Development Dynamics, 2024 | ### Generation Process For each record: 1. A country is sampled uniformly from 12 SSA nations. 2. A year (2018–2025) is drawn; population is projected forward using each country's growth rate. 3. Region type (capital / urban / rural / remote) is drawn with weights that depend on the country's development tier, modulating vacancy and qualification rates. 4. Sector (10 categories) and grade level (5 categories) are sampled from empirical weight distributions. 5. Vacancy rate, degree rate, training hours, retention, salary, wage bill, performance evaluation, and digital literacy are computed as deterministic functions of country parameters, region adjustments, scenario multipliers, and bounded random noise. 6. A composite **capacity score** (0–1) is calculated as a weighted sum of vacancy, qualification, training, retention, evaluation, and digital literacy rates. 7. The score is discretised into four **capacity classes**: high (≥0.65), moderate (0.50–0.65), low (0.35–0.50), critical (<0.35). ### Scenario Design | Scenario | Vacancy multiplier | Qualification multiplier | Training multiplier | Description | |---|---|---|---|---| | `baseline` | 1.0× | 1.0× | 1.0× | Current SSA civil service landscape | | `reform_modernized` | 0.7× | 1.3× | 1.5× | Meritocratic recruitment reform, digital transformation, increased CPD investment | | `underresourced` | 1.5× | 0.7× | 0.5× | Austerity, brain drain, reduced training budgets | ## Dataset Description ### Schema | Column | Type | Description | Range | |---|---|---|---| | `record_id` | int | Unique record identifier | 1–10000 | | `country` | str | Country name | 12 SSA nations | | `year` | int | Observation year | 2018–2025 | | `region_type` | str | Geographic region type | capital, urban, rural, remote | | `sector` | str | Government sector | 10 sectors | | `grade_level` | str | Seniority level | Junior, Mid_Level, Senior, Director, Executive | | `population_millions` | float | Estimated population (millions) | 0.5–300 | | `total_posts` | int | Total approved posts | — | | `vacancy_rate` | float | Proportion of unfilled posts | 0.05–0.60 | | `filled_posts` | int | Number of filled posts | — | | `degree_rate` | float | Proportion with tertiary degree | 0.10–0.80 | | `degree_holders` | int | Number holding degrees | — | | `training_hours_annual` | int | Annual training hours per employee | 0–100+ | | `training_participation_rate` | float | Proportion participating in CPD | 0.20–0.95 | | `retention_rate` | float | Annual staff retention rate | 0.50–0.98 | | `avg_salary_usd` | float | Average monthly salary (USD) | 20–5000 | | `wage_bill_pct_gdp` | float | Public wage bill as % of GDP | 0–0.20 | | `performance_eval_rate` | float | Proportion with formal evaluations | 0.15–0.95 | | `digital_literacy_rate` | float | Digital literacy rate | 0.10–0.90 | | `capacity_score` | float | Composite capacity score | 0.0–1.0 | | `capacity_class` | str | Discretised capacity class | critical, low, moderate, high | ### Summary Statistics | Metric | Baseline | Reform Modernized | Underresourced | |---|---|---|---| | Mean vacancy rate | 0.249 | 0.174 | 0.357 | | Mean degree rate | 0.308 | 0.400 | 0.218 | | Mean training hours | 24.5 | 37.0 | 12.0 | | Mean capacity score | 0.535 | 0.623 | 0.439 | | Capacity: critical | 908 | 43 | 2,664 | | Capacity: low | 3,070 | 2,531 | 4,169 | | Capacity: moderate | 3,322 | 3,564 | 3,164 | | Capacity: high | 2,700 | 3,862 | 3 | ## Validation Results All datasets pass column, range, categorical, and consistency checks. A small number of boundary-edge cases (scores landing exactly on class thresholds) are flagged but are benign floating-point artifacts. Diagnostic plots (generated by `validate_dataset.py`) are stored in `data/plots/`: - `scenario_comparison.png` — Distribution overlays of six key metrics across scenarios - `capacity_class_distribution.png` — Class count bar charts per scenario - `vacancy_by_country.png` — Box plots of vacancy rate by country per scenario - `capacity_by_country.png` — Box plots of capacity score by country per scenario ## Usage ### Loading Data ```python import pandas as pd baseline = pd.read_csv("data/baseline.csv") reform = pd.read_csv("data/reform_modernized.csv") underresourced = pd.read_csv("data/underresourced.csv") ``` ### Classification Example ```python from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import classification_report df = pd.read_csv("data/baseline.csv") features = ["vacancy_rate", "degree_rate", "training_hours_annual", "training_participation_rate", "retention_rate", "avg_salary_usd", "wage_bill_pct_gdp", "performance_eval_rate", "digital_literacy_rate"] X = df[features] y = df["capacity_class"] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) clf = RandomForestClassifier(n_estimators=100, random_state=42) clf.fit(X_train, y_train) print(classification_report(y_test, clf.predict(X_test))) ``` ### Regression Example ```python from sklearn.ensemble import GradientBoostingRegressor from sklearn.metrics import mean_squared_error X = df[features] y = df["capacity_score"] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) reg = GradientBoostingRegressor(n_estimators=200, random_state=42) reg.fit(X_train, y_train) print("RMSE:", mean_squared_error(y_test, reg.predict(X_test), squared=False)) ``` ### Hugging Face Datasets ```python from datasets import load_dataset ds = load_dataset("electricsheepafrica/african-civil-service-capacity", "baseline") ``` ## Reproduction ```bash pip install -r requirements.txt python generate_dataset.py --scenario baseline --n 10000 --seed 42 python generate_dataset.py --scenario reform_modernized --n 10000 --seed 42 python generate_dataset.py --scenario underresourced --n 10000 --seed 42 python validate_dataset.py ``` ## Limitations 1. **Synthetic data** — All records are generated, not drawn from administrative registers. Correlations are modelled, not empirically estimated from linked micro-data. 2. **Country uniformity** — Records are drawn uniformly across countries, not proportional to actual public employment volumes. Nigeria's 230M population gets the same weight as Botswana's 2.6M. 3. **Temporal dynamics** — Year-to-year changes use a simple growth projection; no autoregressive or lagged effects are modelled. 4. **Sector/grade independence** — Sector and grade assignments are independent draws; real staffing has cross-tabulation structure. 5. **No gender or age breakdowns** — The dataset does not disaggregate by demographics. 6. **Boundary sensitivity** — A small number of capacity scores land exactly on class thresholds due to floating-point arithmetic, producing negligible misclassifications in validation. ## References 1. South Africa Public Service Commission. *Public Service Reforms Report*. 2024. 2. South Africa National Treasury. *Medium Term Budget Policy Statement: Compensation and Employment Data*. 2024. 3. Statistics South Africa. *Annual Report 2023/24*. 2024. 4. Uganda Ministry of Public Service. *State of Human Resources Report*. 2023. 5. OECD / African Union Commission. *Africa's Development Dynamics 2024*. 2024. 6. Mo Ibrahim Foundation. *Public Service in Africa: Working for the People*. 2018. 7. Kenya Public Service Commission. *Annual Report*. 2025. 8. Mastercard Foundation. *Africa Youth Employment and Education Report*. 2026. 9. Africa Careers Network. *Employability in Africa Survey*. 2023. ## Citation If you use this dataset, please cite: ```bibtex @misc{esa_civil_service_2026, title = {African Civil Service Capacity Dataset}, author = {Electric Sheep Africa}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/electricsheepafrica/african-civil-service-capacity}, note = {Synthetic tabular dataset for civil-service capacity classification and regression} } ``` ## License This dataset is released under the [Creative Commons Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/) license. You are free to share and adapt the material for any purpose, including commercially, provided you give appropriate credit.

提供机构：

electricsheepafrica

5,000+

优质数据集

54 个

任务类型

进入经典数据集