five

electricsheepafrica/african-insurance-fraud-detection

收藏
Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/african-insurance-fraud-detection
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - tabular-classification tags: - insurance - fraud - detection - ml - sub-saharan-africa - synthetic - insurtech pretty_name: African Insurance Fraud Detection size_categories: - 100K<n<1M language: - en --- # African Insurance Fraud Detection Synthetic insurance claims dataset for fraud detection across 12 Sub-Saharan African countries, covering three operational scenarios. ## Dataset Description - **12 countries:** South Africa, Nigeria, Kenya, Ghana, Tanzania, Uganda, Rwanda, Ethiopia, Senegal, Côte d'Ivoire, Zambia, Mozambique - **3 scenarios:** baseline (~12 % fraud), enhanced_detection, fraud_wave (~22 % fraud) - **360 000 total records** (10 000 per country × 12 countries × 3 scenarios) - **Year:** 2024 ## Variables | Variable | Type | Description | |---|---|---| | `record_id` | int | Unique row identifier | | `country` | str | Country name | | `year` | int | Reporting year | | `claim_amount_usd` | float | Claim value in USD | | `policy_duration_months` | int | Months since policy inception | | `previous_claims_count` | int | Prior claims on the policy | | `reporting_delay_days` | int | Days between incident and report | | `document_completeness_score` | float | Completeness of submitted docs (0–1) | | `claimant_age` | int | Age of claimant (18–85) | | `claim_type` | str | motor, health, property, life, travel, crop | | `seasonality_flag` | int | 1 if festive-season month (Nov–Jan) | | `amount_deviation_zscore` | float | Z-score of claim amount vs country mean | | `duplicate_claim_flag` | int | Potential duplicate claim detected | | `fraud_label` | int | 1 = fraudulent, 0 = legitimate | | `fraud_probability` | float | Modelled probability of fraud (0–1) | | `fraud_scheme_type` | str | Scheme category (or "none") | | `detection_method` | str | How fraud was identified (or "none") | | `recovery_amount_usd` | float | Amount recovered from fraudster | ## Scenarios | Scenario | Fraud rate | Description | |---|---|---| | `baseline` | ~12 % | Typical operational conditions | | `enhanced_detection` | ~12 % | Same fraud rate, higher detection power | | `fraud_wave` | ~22 % | Elevated fraud across all countries | ## Files - `data/baseline.csv` - `data/enhanced_detection.csv` - `data/fraud_wave.csv` ## Intended Use Training and benchmarking tabular classification models for insurance fraud detection in African markets. Suitable for gradient-boosted trees, logistic regression, and neural network approaches. ## Limitations Synthetic data with simplified correlations; does not capture real-world inter-claim dependencies, organised fraud networks, or temporal dynamics beyond a single year. ## License CC-BY-4.0
提供机构:
electricsheepafrica
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作