electricsheepafrica/african-insurance-fraud-detection
收藏Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/african-insurance-fraud-detection
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- tabular-classification
tags:
- insurance
- fraud
- detection
- ml
- sub-saharan-africa
- synthetic
- insurtech
pretty_name: African Insurance Fraud Detection
size_categories:
- 100K<n<1M
language:
- en
---
# African Insurance Fraud Detection
Synthetic insurance claims dataset for fraud detection across 12 Sub-Saharan African countries, covering three operational scenarios.
## Dataset Description
- **12 countries:** South Africa, Nigeria, Kenya, Ghana, Tanzania, Uganda, Rwanda, Ethiopia, Senegal, Côte d'Ivoire, Zambia, Mozambique
- **3 scenarios:** baseline (~12 % fraud), enhanced_detection, fraud_wave (~22 % fraud)
- **360 000 total records** (10 000 per country × 12 countries × 3 scenarios)
- **Year:** 2024
## Variables
| Variable | Type | Description |
|---|---|---|
| `record_id` | int | Unique row identifier |
| `country` | str | Country name |
| `year` | int | Reporting year |
| `claim_amount_usd` | float | Claim value in USD |
| `policy_duration_months` | int | Months since policy inception |
| `previous_claims_count` | int | Prior claims on the policy |
| `reporting_delay_days` | int | Days between incident and report |
| `document_completeness_score` | float | Completeness of submitted docs (0–1) |
| `claimant_age` | int | Age of claimant (18–85) |
| `claim_type` | str | motor, health, property, life, travel, crop |
| `seasonality_flag` | int | 1 if festive-season month (Nov–Jan) |
| `amount_deviation_zscore` | float | Z-score of claim amount vs country mean |
| `duplicate_claim_flag` | int | Potential duplicate claim detected |
| `fraud_label` | int | 1 = fraudulent, 0 = legitimate |
| `fraud_probability` | float | Modelled probability of fraud (0–1) |
| `fraud_scheme_type` | str | Scheme category (or "none") |
| `detection_method` | str | How fraud was identified (or "none") |
| `recovery_amount_usd` | float | Amount recovered from fraudster |
## Scenarios
| Scenario | Fraud rate | Description |
|---|---|---|
| `baseline` | ~12 % | Typical operational conditions |
| `enhanced_detection` | ~12 % | Same fraud rate, higher detection power |
| `fraud_wave` | ~22 % | Elevated fraud across all countries |
## Files
- `data/baseline.csv`
- `data/enhanced_detection.csv`
- `data/fraud_wave.csv`
## Intended Use
Training and benchmarking tabular classification models for insurance fraud detection in African markets. Suitable for gradient-boosted trees, logistic regression, and neural network approaches.
## Limitations
Synthetic data with simplified correlations; does not capture real-world inter-claim dependencies, organised fraud networks, or temporal dynamics beyond a single year.
## License
CC-BY-4.0
提供机构:
electricsheepafrica



