electricsheepafrica/african-life-insurance-actuarial
收藏Hugging Face2026-04-02 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/african-life-insurance-actuarial
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: African Life Insurance Actuarial
license: cc-by-4.0
language:
- en
tags:
- insurance
- life-insurance
- actuarial
- mortality
- sub-saharan-africa
- synthetic
- hiv
- health
- africa
- south-africa
- nigeria
- kenya
- tabular
size_categories:
- 10K<n<100K
task_categories:
- tabular-classification
- tabular-regression
configs:
- config_name: baseline
data_files: data/baseline.csv
- config_name: improved_mortality
data_files: data/improved_mortality.csv
- config_name: pandemic_impact
data_files: data/pandemic_impact.csv
---
# African Life Insurance Actuarial Dataset
## Abstract
A comprehensive synthetic actuarial dataset for life insurance markets across 12 Sub-Saharan African (SSA) countries. The dataset contains 10,000 records per scenario across three scenarios (baseline, improved mortality, pandemic impact), totaling 30,000 observations with 25 variables covering mortality rates, life expectancy, HIV prevalence, policy metrics, claims data, and actuarial quality indicators. Calibrated to published mortality tables from the Actuarial Society of South Africa and Kenya KE 2007-2010 tables, WHO life expectancy estimates, and UNAIDS HIV prevalence data.
## Introduction
Life insurance markets in Sub-Saharan Africa face unique actuarial challenges distinct from developed markets: elevated mortality rates driven by infectious disease burden (particularly HIV/AIDS), lower policy persistency due to income volatility, developing actuarial infrastructure, and heterogeneous regulatory environments. This dataset provides researchers, actuaries, and policymakers with a realistic synthetic environment to study these dynamics.
The dataset covers 12 SSA countries representing ~60% of the region's population and the majority of formal life insurance markets: South Africa, Nigeria, Kenya, Ghana, Tanzania, Rwanda, Uganda, Ethiopia, Senegal, Zambia, Côte d'Ivoire, and Mozambique.
## Methodology
### Parameterization Evidence Table
| Parameter | Source | Key Values |
|-----------|--------|------------|
| Mortality rates | Actuarial Society of South Africa (ASSA) Axxx tables; Kenya KE 2007-2010 mortality tables | Age-specific qx values calibrated to SSA context |
| Life expectancy | WHO Global Health Observatory (2023) | SA: 65y, Nigeria: 55y, Kenya: 67y, range 55-69y across SSA |
| HIV prevalence | UNAIDS Global AIDS Update (2023) | SA: 13.0%, Zambia: 11.0%, Mozambique: 11.5%, Nigeria: 1.5% |
| Policy persistency | Insurance market studies (SSA) | 58-85% range, lower than developed markets (85-95%) |
| Morbidity rates | WHO SSA disease burden data | Infectious disease multiplier 1.8× developed market baseline |
| HIV mortality impact | ASSA HIV model; UNAIDS mortality estimates | Peak impact ages 25-44, additional 30-45 deaths/1000 |
### Data Generation Process
1. **Country calibration**: Each country receives specific parameters for life expectancy, HIV prevalence, base mortality multiplier, policy persistency baseline, and GDP factor derived from published sources.
2. **Age-specific mortality**: Base mortality rates follow standard actuarial table shapes (high infant mortality, low juvenile, rising adult mortality) with SSA-specific adjustments for infectious disease burden.
3. **HIV mortality modeling**: HIV prevalence modulates age-specific mortality impact, with peak effects in the 25-44 age groups consistent with SSA epidemiological patterns.
4. **Gender differentials**: Male mortality set at ~15% above female mortality, consistent with observed SSA patterns.
5. **Policy metrics**: Premium, sum assured, persistency, and lapse rates calibrated to SSA insurance market studies, with GDP factors adjusting for purchasing power differences.
6. **Actuarial quality classification**: Derived from solvency ratio, reserve adequacy, and loss ratio using a composite scoring system.
### Scenario Design
| Scenario | Description | Mortality Adjustment | HIV Impact | Persistency |
|----------|-------------|---------------------|------------|-------------|
| **baseline** | Current SSA conditions | 1.0× (no adjustment) | Standard | Country baseline |
| **improved_mortality** | ART scale-up, healthcare improvement | 0.85× (15% reduction) | 0.7× (30% reduction) | +3 percentage points |
| **pandemic_impact** | Health system disruption | 1.35× (35% increase) | 1.2× (20% increase) | -5 percentage points |
## Schema
| Variable | Type | Description | Range |
|----------|------|-------------|-------|
| record_id | int | Unique record identifier | 0-9999 |
| country | str | SSA country name | 12 countries |
| year | int | Policy year | 2019-2024 |
| age_group | str | Age bracket | 0-4, 5-14, 15-24, 25-34, 35-44, 45-54, 55-64, 65-74, 75+ |
| gender | str | Biological sex | Male, Female |
| mortality_rate_per_1000 | float | Deaths per 1000 population | 0.5-500 |
| life_expectancy_years | float | Remaining life expectancy | 30-90 |
| morbidity_rate_pct | float | Population with significant health conditions | 1-95 |
| hiv_prevalence_pct | float | HIV prevalence in population | 0.1-30 |
| policy_count_thousands | float | Number of policies (thousands) | 10-2000 |
| premium_per_policy_usd | float | Annual premium per policy | 10-500 |
| sum_assured_avg_usd | float | Average sum assured | 500-50000 |
| policy_persistency_rate_pct | float | Annual policy persistency rate | 40-95 |
| lapse_rate_pct | float | Annual policy lapse rate | 2-40 |
| claim_frequency_per_1000_policies | float | Claims per 1000 policies | 0.5-500 |
| avg_claim_amount_usd | float | Average claim payout | 200-30000 |
| loss_ratio_pct | float | Claims as % of premiums | 20-150 |
| reserve_adequacy_ratio | float | Reserve adequacy indicator | 0.5-2.0 |
| underwriting_profit_margin_pct | float | Underwriting profit margin | -30 to 40 |
| product_type | str | Insurance product type | term_life, whole_life, endowment, group_life, credit_life |
| distribution_channel | str | Sales channel | agent, bancassurance, corporate, digital |
| commission_rate_pct | float | Agent/broker commission rate | 1-25 |
| expense_ratio_pct | float | Operating expense ratio | 3-35 |
| solvency_ratio | float | Capital adequacy ratio | 0.5-3.5 |
| actuarial_quality_class | str | Overall actuarial quality | strong, adequate, weak, critical |
## Summary Statistics
### Baseline Scenario
| Metric | Mean | Std | Min | Max |
|--------|------|-----|-----|-----|
| Mortality rate (per 1000) | — | — | — | — |
| Life expectancy (years) | — | — | — | — |
| HIV prevalence (%) | — | — | — | — |
| Persistency rate (%) | — | — | — | — |
| Solvency ratio | — | — | — | — |
*Full statistics available in `summary_statistics.json`*
## Validation Results
The dataset undergoes 15+ plausibility checks:
- **Schema validation**: All 25 columns present, correct types, no duplicates
- **Categorical validation**: Expected values for all categorical variables
- **Mortality age gradient**: U-shaped pattern (high infant, low juvenile, rising adult)
- **Life expectancy ranges**: Country-specific calibration within ±8 years
- **HIV prevalence**: Country-specific calibration within ±5 percentage points
- **HIV-mortality correlation**: Positive correlation expected
- **Persistency-lapse relationship**: Negative correlation, sum < 100%
- **Actuarial metrics**: All within plausible bounds
- **Morbidity rates**: Increasing with age, elevated vs. developed markets
- **Gender differential**: Male mortality > female mortality
- **Policy metrics**: Positive premiums, reasonable pricing
- **Actuarial quality logic**: Quality class correlates with solvency
- **Year distribution**: Uniform across 2019-2024
- **Cross-scenario monotonicity**: Improved < Baseline < Pandemic for mortality
Diagnostic plots (8 panels) are generated and saved as `diagnostic_plots.png`.
## Usage
```python
import pandas as pd
# Load a scenario
df = pd.read_csv("data/baseline.csv")
# Analyze mortality by country
mortality_by_country = df.groupby('country')['mortality_rate_per_1000'].mean()
# Compare scenarios
baseline = pd.read_csv("data/baseline.csv")
improved = pd.read_csv("data/improved_mortality.csv")
pandemic = pd.read_csv("data/pandemic_impact.csv")
# HIV impact analysis
hiv_mortality_corr = df['hiv_prevalence_pct'].corr(df['mortality_rate_per_1000'])
# Actuarial quality distribution
quality_dist = df['actuarial_quality_class'].value_counts(normalize=True)
```
## Limitations
1. **Synthetic data**: Generated from calibrated distributions, not observed policy records. Results should not be used for actual pricing or reserving without validation against local experience.
2. **Country aggregation**: Within-country heterogeneity (urban/rural, socioeconomic) is not captured.
3. **Static HIV prevalence**: HIV prevalence is held constant within each country; temporal trends are not modeled.
4. **Simplified product dynamics**: Product features (riders, bonuses, guarantees) are not modeled.
5. **Regulatory variation**: Country-specific regulatory requirements and capital standards are not differentiated.
6. **Currency effects**: All values in USD; local currency volatility and inflation are not modeled.
## References
1. Actuarial Society of South Africa. "ASSA2017 Mortality Tables." Cape Town, 2017.
2. Actuarial Society of South Africa. "ASSA HIV Model." Cape Town, 2023.
3. Kenya Actuarial Society. "KE 2007-2010 Mortality Tables." Nairobi, 2012.
4. World Health Organization. "Global Health Observatory: Life Expectancy." Geneva, 2023.
5. UNAIDS. "Global AIDS Update 2023." Geneva, 2023.
6. Swiss Re Institute. "Sigma: Life Insurance in Emerging Markets." Zurich, 2023.
7. African Development Bank. "African Economic Outlook 2024." Abidjan, 2024.
## Citation
```bibtex
@dataset{african_life_insurance_actuarial_2024,
author = {Electric Sheep Africa},
title = {African Life Insurance Actuarial Dataset},
year = {2024},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/electricsheepafrica/african-life-insurance-actuarial},
license = {CC-BY-4.0}
}
```
## License
This dataset is licensed under the Creative Commons Attribution 4.0 International License (CC-BY-4.0).
数据集标识:
pretty_name: 非洲人寿保险精算数据集
license: CC-BY-4.0(知识共享署名4.0国际许可协议)
language:
- 英语
tags:
- 保险
- 人寿保险
- 精算(actuarial)
- 死亡率(mortality)
- 撒哈拉以南非洲(sub-saharan-africa)
- 人工合成数据(synthetic)
- 人类免疫缺陷病毒(HIV)
- 健康
- 非洲
- 南非
- 尼日利亚
- 肯尼亚
- 表格型数据(tabular)
size_categories:
- 10000条 < 记录数 < 100000条
task_categories:
- 表格分类(tabular-classification)
- 表格回归(tabular-regression)
configs:
- config_name: 基线场景(baseline)
data_files: data/baseline.csv
- config_name: 死亡率改善场景(improved_mortality)
data_files: data/improved_mortality.csv
- config_name: 疫情影响场景(pandemic_impact)
data_files: data/pandemic_impact.csv
# 非洲人寿保险精算数据集
## 摘要
本数据集为覆盖12个撒哈拉以南非洲(Sub-Saharan Africa, SSA)国家的人寿保险市场的综合性人工生成精算数据集。数据集包含三个场景下各10000条记录,总计30000条观测数据,涵盖25个变量,涉及死亡率、预期寿命、人类免疫缺陷病毒(HIV)感染率、保单指标、理赔数据以及精算质量指标。
本数据集依据南非精算学会(Actuarial Society of South Africa)和肯尼亚KE 2007-2010死亡率表、世界卫生组织(WHO)预期寿命估算数据以及联合国艾滋病规划署(UNAIDS)HIV感染率数据进行校准。
## 引言
撒哈拉以南非洲(Sub-Saharan Africa, SSA)的人寿保险市场面临着与发达市场截然不同的独特精算挑战:由传染病负担(尤其是艾滋病/艾滋病综合征(AIDS))导致的高死亡率、收入波动引发的保单持续率较低、尚在发展中的精算基础设施以及差异化的监管环境。本数据集为研究人员、精算师和政策制定者提供了一个贴合实际的人工生成研究环境,用于探索这些市场动态。
本数据集覆盖12个撒哈拉以南非洲国家,涵盖该地区约60%的人口以及绝大多数正规人寿保险市场,包括南非、尼日利亚、肯尼亚、加纳、坦桑尼亚、卢旺达、乌干达、埃塞俄比亚、塞内加尔、赞比亚、科特迪瓦以及莫桑比克。
## 研究方法
### 参数化依据表
| 参数 | 数据来源 | 关键取值 |
|------|----------|----------|
| 死亡率 | 南非精算学会(Actuarial Society of South Africa, ASSA)Axxx系列死亡率表;肯尼亚KE 2007-2010死亡率表 | 针对撒哈拉以南非洲场景校准的分年龄死亡率$q_x$值 |
| 预期寿命 | 世界卫生组织(WHO)全球健康观察站(2023) | 南非:65岁,尼日利亚:55岁,肯尼亚:67岁,撒哈拉以南非洲各国区间为55-69岁 |
| HIV感染率 | 联合国艾滋病规划署(UNAIDS)全球艾滋病报告(2023) | 南非:13.0%,赞比亚:11.0%,莫桑比克:11.5%,尼日利亚:1.5% |
| 保单持续率 | 撒哈拉以南非洲保险市场研究 | 区间为58%-85%,低于发达市场的85%-95% |
| 发病率 | 世界卫生组织撒哈拉以南非洲疾病负担数据 | 传染病倍数为发达市场基准值的1.8倍 |
| HIV死亡率影响 | 南非精算学会HIV模型;联合国艾滋病规划署死亡率估算数据 | 峰值影响年龄为25-44岁,额外死亡率为30-45例/1000人 |
### 数据生成流程
1. **国家校准**:每个国家根据公开数据源获取专属参数,包括预期寿命、HIV感染率、基础死亡率乘数、保单持续率基准值以及基于购买力平价调整的GDP因子。
2. **分年龄死亡率**:基础死亡率遵循标准精算表的分布形态(婴儿死亡率高、青少年死亡率低、成年死亡率随年龄上升),并针对撒哈拉以南非洲的传染病负担进行调整。
3. **HIV死亡率建模**:HIV感染率调节分年龄死亡率的影响程度,峰值效应出现在25-44岁年龄组,符合撒哈拉以南非洲的流行病学规律。
4. **性别差异**:男性死亡率设定为比女性高约15%,与撒哈拉以南非洲的观测规律一致。
5. **保单指标**:保费、保险金额、持续率以及退保率依据撒哈拉以南非洲保险市场研究进行校准,并通过GDP因子调整购买力差异。
6. **精算质量分类**:通过偿付能力比率、准备金充足率和赔付率的综合评分体系推导得出。
### 场景设计
| 场景名称 | 场景描述 | 死亡率调整系数 | HIV影响系数 | 保单持续率 |
|----------|----------|----------------|-------------|------------|
| **baseline(基线场景)** | 当前撒哈拉以南非洲市场环境 | 1.0×(无调整) | 标准水平 | 国家基准值 |
| **improved_mortality(死亡率改善场景)** | 抗逆转录病毒治疗(ART)推广、医疗保健改善 | 0.85×(降低15%) | 0.7×(降低30%) | 提升3个百分点 |
| **pandemic_impact(疫情影响场景)** | 医疗系统运转中断 | 1.35×(升高35%) | 1.2×(升高20%) | 下降5个百分点 |
## 数据字段结构
| 变量名 | 数据类型 | 变量说明 | 取值范围 |
|-------|----------|----------|----------|
| record_id | int | 唯一记录标识符 | 0-9999 |
| country | str | 撒哈拉以南非洲国家名称 | 12个指定国家 |
| year | int | 保单年度 | 2019-2024 |
| age_group | str | 年龄组 | 0-4, 5-14, 15-24, 25-34, 35-44, 45-54, 55-64, 65-74, 75+ |
| gender | str | 生物学性别 | 男性(Male)、女性(Female) |
| mortality_rate_per_1000 | float | 每1000人口死亡数 | 0.5-500 |
| life_expectancy_years | float | 剩余预期寿命(年) | 30-90 |
| morbidity_rate_pct | float | 患有重大健康问题的人口占比(%) | 1-95 |
| hiv_prevalence_pct | float | 人群HIV感染率(%) | 0.1-30 |
| policy_count_thousands | float | 保单数量(单位:千) | 10-2000 |
| premium_per_policy_usd | float | 单保单年度保费(美元) | 10-500 |
| sum_assured_avg_usd | float | 平均保险金额(美元) | 500-50000 |
| policy_persistency_rate_pct | float | 年度保单持续率(%) | 40-95 |
| lapse_rate_pct | float | 年度保单退保率(%) | 2-40 |
| claim_frequency_per_1000_policies | float | 每1000张保单的理赔次数 | 0.5-500 |
| avg_claim_amount_usd | float | 平均理赔赔付额(美元) | 200-30000 |
| loss_ratio_pct | float | 赔付支出占保费比例(%) | 20-150 |
| reserve_adequacy_ratio | float | 准备金充足率指标 | 0.5-2.0 |
| underwriting_profit_margin_pct | float | 承保利润率(%) | -30至40 |
| product_type | str | 保险产品类型 | 定期寿险(term_life)、终身寿险(whole_life)、两全寿险(endowment)、团体寿险(group_life)、信贷寿险(credit_life) |
| distribution_channel | str | 销售渠道 | 代理人(agent)、银保渠道(bancassurance)、企业渠道(corporate)、数字渠道(digital) |
| commission_rate_pct | float | 代理人/经纪人佣金率(%) | 1-25 |
| expense_ratio_pct | float | 运营费用率(%) | 3-35 |
| solvency_ratio | float | 资本充足率 | 0.5-3.5 |
| actuarial_quality_class | str | 整体精算质量等级 | 优秀(strong)、充足(adequate)、薄弱(weak)、危急(critical) |
## 汇总统计
### 基线场景
| 指标 | 均值 | 标准差 | 最小值 | 最大值 |
|------|------|--------|--------|--------|
| 死亡率(每1000人) | — | — | — | — |
| 预期寿命(年) | — | — | — | — |
| HIV感染率(%) | — | — | — | — |
| 持续率(%) | — | — | — | — |
| 资本充足率 | — | — | — | — |
*完整统计数据请参见`summary_statistics.json`文件*
## 验证结果
本数据集通过15项以上的合理性校验:
- **字段结构校验**:包含全部25个字段,数据类型正确,无重复记录
- **分类变量校验**:所有分类变量取值符合预期范围
- **死亡率年龄梯度校验**:呈现U型分布(婴儿死亡率高、青少年死亡率低、成年死亡率随年龄上升)
- **预期寿命范围校验**:各国校准值在±8年区间内
- **HIV感染率校验**:各国校准值在±5个百分点区间内
- **HIV与死亡率相关性校验**:预期为正相关关系
- **持续率-退保率关系校验**:负相关关系,两者之和小于100%
- **精算指标校验**:所有指标均处于合理区间内
- **发病率校验**:随年龄增长而上升,且高于发达市场水平
- **性别差异校验**:男性死亡率高于女性
- **保单指标校验**:保费为正值,定价合理
- **精算质量逻辑校验**:质量等级与偿付能力比率呈正相关
- **年度分布校验**:2019-2024年分布均匀
- **跨场景单调性校验**:死亡率水平依次为:死亡率改善场景 < 基线场景 < 疫情影响场景
配套生成8张诊断绘图并保存为`diagnostic_plots.png`文件。
## 使用示例
python
import pandas as pd
# 加载指定场景数据
df = pd.read_csv("data/baseline.csv")
# 按国家分组分析平均死亡率
mortality_by_country = df.groupby('country')['mortality_rate_per_1000'].mean()
# 对比不同场景数据
baseline = pd.read_csv("data/baseline.csv")
improved = pd.read_csv("data/improved_mortality.csv")
pandemic = pd.read_csv("data/pandemic_impact.csv")
# HIV与死亡率相关性分析
hiv_mortality_corr = df['hiv_prevalence_pct'].corr(df['mortality_rate_per_1000'])
# 精算质量等级分布
quality_dist = df['actuarial_quality_class'].value_counts(normalize=True)
## 局限性
1. **人工合成数据**:基于校准分布生成,而非真实保单记录。未经本地经验验证的情况下,不得直接用于实际定价或准备金测算。
2. **国家层面聚合**:未涵盖国家内部的异质性(如城乡差异、社会经济分层)。
3. **静态HIV感染率**:每个国家的HIV感染率为固定值,未建模时间趋势。
4. **简化的产品动态**:未涵盖产品附加条款、红利、保证条款等特征。
5. **监管差异**:未区分各国特定的监管要求与资本标准。
6. **货币效应**:所有价值均以美元计价,未考虑本地货币波动与通胀因素。
## 参考文献
1. Actuarial Society of South Africa. 《ASSA2017 Mortality Tables》. 开普敦, 2017.
2. Actuarial Society of South Africa. 《ASSA HIV Model》. 开普敦, 2023.
3. Kenya Actuarial Society. 《KE 2007-2010 Mortality Tables》. 内罗毕, 2012.
4. 世界卫生组织(WHO). 《Global Health Observatory: Life Expectancy》. 日内瓦, 2023.
5. 联合国艾滋病规划署(UNAIDS). 《Global AIDS Update 2023》. 日内瓦, 2023.
6. 瑞士再保险研究院(Swiss Re Institute). 《Sigma: Life Insurance in Emerging Markets》. 苏黎世, 2023.
7. 非洲开发银行. 《African Economic Outlook 2024》. 阿比让, 2024.
## 引用格式
bibtex
@dataset{african_life_insurance_actuarial_2024,
author = {Electric Sheep Africa},
title = {African Life Insurance Actuarial Dataset},
year = {2024},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/electricsheepafrica/african-life-insurance-actuarial},
license = {CC-BY-4.0}
}
## 许可协议
本数据集采用知识共享署名4.0国际许可协议(Creative Commons Attribution 4.0 International License, CC-BY-4.0)进行授权。
提供机构:
electricsheepafrica



