five

electricsheepafrica/african-health-insurance-claims

收藏
Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/african-health-insurance-claims
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - tabular-classification - tabular-regression tags: - insurance - health - claims - sub-saharan-africa - synthetic - healthtech - healthcare - fraud-detection - reimbursement pretty_name: African Health Insurance Claims size_categories: - 10K<n<100K --- # African Health Insurance Claims Synthetic health insurance claims dataset covering **12 Sub-Saharan African countries** across **3 policy scenarios**, with ~19 variables per record. ## Dataset Description This dataset simulates health insurance claims processing outcomes across Sub-Saharan Africa, drawing on publicly available data from national health insurance authorities including Nigeria's NHIA, Kenya's SHA, and Ghana's NHIS. It is designed for training ML models on claims classification (approval/rejection), fraud detection, reimbursement prediction, and healthcare cost estimation. ### Countries Nigeria, Kenya, Ghana, South Africa, Tanzania, Rwanda, Ethiopia, Uganda, Senegal, Zambia, Malawi, Cameroon ### Scenarios | Scenario | Description | Records/Country | |---|---|---| | `baseline` | Current-state claims processing with typical rejection/fraud rates | 3,000 | | `expanded_coverage` | Expanded insurance coverage with improved processing (lower rejection, faster turnaround) | 4,000 | | `claims_surge` | High-volume claims period with elevated rejection and fraud rates | 5,000 | ### Variables | Column | Type | Description | |---|---|---| | `claim_id` | string | Unique claim identifier (format: `COUNTRY-SCENARIO-NNNNNNN`) | | `country` | string | Country name | | `year` | int | Claim year (2021-2024) | | `claim_date` | string | Claim submission date (YYYY-MM-DD) | | `scenario` | string | Policy scenario (`baseline`, `expanded_coverage`, `claims_surge`) | | `diagnosis_category` | string | Primary diagnosis: `malaria`, `diabetes`, `hypertension`, `maternity`, `surgery`, `outpatient` | | `treatment_cost_usd` | float | Actual treatment cost in USD | | `claimed_amount_usd` | float | Amount claimed by provider (may include overclaiming) | | `approved_amount_usd` | float | Amount approved for reimbursement (0 if rejected) | | `reimbursement_rate` | float | Approved amount / claimed amount (0 if rejected) | | `processing_days` | int | Days from claim submission to decision | | `rejection_flag` | int | 1 if claim rejected, 0 if approved | | `rejection_reason` | string | Reason for rejection (or `none` if approved) | | `provider_type` | string | `public`, `private`, or `faith_based` | | `insurance_type` | string | `social`, `private`, or `micro` | | `policyholder_age` | int | Age of policyholder (0-95) | | `gender` | string | `M` or `F` | | `claim_severity` | string | `mild`, `moderate`, `severe`, or `critical` | | `fraud_flag` | int | 1 if claim flagged as potentially fraudulent, 0 otherwise | ### Rejection Reasons `treatment_diagnosis_mismatch`, `duplicate_claim`, `inactive_member`, `inappropriate_prescription`, `oversupply_medication`, `wrong_tariff_applied`, `missing_documentation`, `exceeded_annual_limit`, `pre_authorization_missing`, `age_treatment_mismatch`, `sex_treatment_mismatch`, `out_of_network_provider` ## Data Sources & Parameters Country-level parameters (health expenditure per capita, insurance penetration, disease prevalence) are derived from: - **WHO African Region Health Expenditure Atlas 2023** - **World Bank Health Expenditure Database (2024)** - **Nigeria NHIA Fee-for-Service Price List (2024-2025)** - **Kenya Social Health Insurance General Regulations (2024)** - **Ghana NHIS claims review studies** (Nsiah-Boateng et al., 2017; Adzakpah & Dwomoh, 2023) - **ACFE Report to the Nations: Sub-Saharan Africa Edition (2020)** - **Board of Healthcare Funders Southern Africa fraud statistics** Cost distributions use log-normal distributions calibrated to each country's healthcare spending level. Rejection rates (7-13%) and fraud rates (3.5-6.5%) reflect reported ranges from SSA health insurance literature. ## Intended Use - **Tabular classification**: Predict claim rejection (`rejection_flag`), detect fraud (`fraud_flag`) - **Tabular regression**: Predict approved amount, reimbursement rate, processing days - **Fairness analysis**: Examine disparities across countries, provider types, insurance types, gender, age groups - **Policy simulation**: Compare outcomes across scenarios (baseline vs. expanded coverage vs. surge) ## Limitations - Synthetic data; does not represent actual claims from any real insurance scheme - Cost distributions are approximate and may not capture local price variations - Fraud labels are probabilistic, not based on confirmed investigations - Processing times reflect system averages, not individual case complexity ## Citation If you use this dataset, please cite: ``` @dataset{african_health_insurance_claims_2026, title={African Health Insurance Claims Dataset}, author={Electric Sheep Africa}, year={2026}, license={cc-by-4.0}, url={https://huggingface.co/datasets/electricsheepafrica/african-health-insurance-claims} } ``` ## License CC BY 4.0
提供机构:
electricsheepafrica
二维码
社区交流群
二维码
科研交流群
商业服务