electricsheepafrica/african-health-insurance-claims
收藏Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/african-health-insurance-claims
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- tabular-classification
- tabular-regression
tags:
- insurance
- health
- claims
- sub-saharan-africa
- synthetic
- healthtech
- healthcare
- fraud-detection
- reimbursement
pretty_name: African Health Insurance Claims
size_categories:
- 10K<n<100K
---
# African Health Insurance Claims
Synthetic health insurance claims dataset covering **12 Sub-Saharan African countries** across **3 policy scenarios**, with ~19 variables per record.
## Dataset Description
This dataset simulates health insurance claims processing outcomes across Sub-Saharan Africa, drawing on publicly available data from national health insurance authorities including Nigeria's NHIA, Kenya's SHA, and Ghana's NHIS. It is designed for training ML models on claims classification (approval/rejection), fraud detection, reimbursement prediction, and healthcare cost estimation.
### Countries
Nigeria, Kenya, Ghana, South Africa, Tanzania, Rwanda, Ethiopia, Uganda, Senegal, Zambia, Malawi, Cameroon
### Scenarios
| Scenario | Description | Records/Country |
|---|---|---|
| `baseline` | Current-state claims processing with typical rejection/fraud rates | 3,000 |
| `expanded_coverage` | Expanded insurance coverage with improved processing (lower rejection, faster turnaround) | 4,000 |
| `claims_surge` | High-volume claims period with elevated rejection and fraud rates | 5,000 |
### Variables
| Column | Type | Description |
|---|---|---|
| `claim_id` | string | Unique claim identifier (format: `COUNTRY-SCENARIO-NNNNNNN`) |
| `country` | string | Country name |
| `year` | int | Claim year (2021-2024) |
| `claim_date` | string | Claim submission date (YYYY-MM-DD) |
| `scenario` | string | Policy scenario (`baseline`, `expanded_coverage`, `claims_surge`) |
| `diagnosis_category` | string | Primary diagnosis: `malaria`, `diabetes`, `hypertension`, `maternity`, `surgery`, `outpatient` |
| `treatment_cost_usd` | float | Actual treatment cost in USD |
| `claimed_amount_usd` | float | Amount claimed by provider (may include overclaiming) |
| `approved_amount_usd` | float | Amount approved for reimbursement (0 if rejected) |
| `reimbursement_rate` | float | Approved amount / claimed amount (0 if rejected) |
| `processing_days` | int | Days from claim submission to decision |
| `rejection_flag` | int | 1 if claim rejected, 0 if approved |
| `rejection_reason` | string | Reason for rejection (or `none` if approved) |
| `provider_type` | string | `public`, `private`, or `faith_based` |
| `insurance_type` | string | `social`, `private`, or `micro` |
| `policyholder_age` | int | Age of policyholder (0-95) |
| `gender` | string | `M` or `F` |
| `claim_severity` | string | `mild`, `moderate`, `severe`, or `critical` |
| `fraud_flag` | int | 1 if claim flagged as potentially fraudulent, 0 otherwise |
### Rejection Reasons
`treatment_diagnosis_mismatch`, `duplicate_claim`, `inactive_member`, `inappropriate_prescription`, `oversupply_medication`, `wrong_tariff_applied`, `missing_documentation`, `exceeded_annual_limit`, `pre_authorization_missing`, `age_treatment_mismatch`, `sex_treatment_mismatch`, `out_of_network_provider`
## Data Sources & Parameters
Country-level parameters (health expenditure per capita, insurance penetration, disease prevalence) are derived from:
- **WHO African Region Health Expenditure Atlas 2023**
- **World Bank Health Expenditure Database (2024)**
- **Nigeria NHIA Fee-for-Service Price List (2024-2025)**
- **Kenya Social Health Insurance General Regulations (2024)**
- **Ghana NHIS claims review studies** (Nsiah-Boateng et al., 2017; Adzakpah & Dwomoh, 2023)
- **ACFE Report to the Nations: Sub-Saharan Africa Edition (2020)**
- **Board of Healthcare Funders Southern Africa fraud statistics**
Cost distributions use log-normal distributions calibrated to each country's healthcare spending level. Rejection rates (7-13%) and fraud rates (3.5-6.5%) reflect reported ranges from SSA health insurance literature.
## Intended Use
- **Tabular classification**: Predict claim rejection (`rejection_flag`), detect fraud (`fraud_flag`)
- **Tabular regression**: Predict approved amount, reimbursement rate, processing days
- **Fairness analysis**: Examine disparities across countries, provider types, insurance types, gender, age groups
- **Policy simulation**: Compare outcomes across scenarios (baseline vs. expanded coverage vs. surge)
## Limitations
- Synthetic data; does not represent actual claims from any real insurance scheme
- Cost distributions are approximate and may not capture local price variations
- Fraud labels are probabilistic, not based on confirmed investigations
- Processing times reflect system averages, not individual case complexity
## Citation
If you use this dataset, please cite:
```
@dataset{african_health_insurance_claims_2026,
title={African Health Insurance Claims Dataset},
author={Electric Sheep Africa},
year={2026},
license={cc-by-4.0},
url={https://huggingface.co/datasets/electricsheepafrica/african-health-insurance-claims}
}
```
## License
CC BY 4.0
提供机构:
electricsheepafrica



