electricsheepafrica/african-data-breach-registry
收藏Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/african-data-breach-registry
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
tags:
- cybersecurity
- data-breach
- privacy
- sub-saharan-africa
- synthetic
- regulatory
language:
- en
pretty_name: African Data Breach Registry
size_categories:
- 10K<n<100K
task_categories:
- tabular-regression
- classification
---
# African Data Breach Registry
Synthetic dataset of data breach incidents across 12 Sub-Saharan African countries, modeled on real regulatory frameworks including Nigeria's NDPPA, Kenya's ODPC, and South Africa's POPIA.
## Dataset Description
- **3 scenarios**: `baseline`, `enforcement_improved`, `breach_surge`
- **10,000 records per scenario** (30,000 total)
- **12 countries**: Nigeria, Kenya, South Africa, Ghana, Tanzania, Uganda, Rwanda, Ethiopia, Senegal, Côte d'Ivoire, Zambia, Mozambique
- **Time period**: 2021–2025
## Variables
| Variable | Type | Description |
|---|---|---|
| record_id | int | Unique record identifier |
| country | str | Country name |
| dpa | str | Data Protection Authority |
| year | int | Breach year |
| month | int | Breach month |
| breach_date | date | Date breach occurred (YYYY-MM-DD) |
| discovery_date | date | Date breach was discovered |
| notification_date | date | Date breach was reported |
| sector | str | Affected sector (financial/healthcare/telecom/government/retail/education) |
| breach_type | str | Attack vector (hacking/insider/malware/phishing/physical/accidental_disclosure) |
| records_exposed | int | Number of records compromised |
| records_with_pii | int | Records containing PII |
| data_types | str | Pipe-separated data types (names/emails/phones/financial/health/biometric) |
| notification_delay_days | int | Days from breach to notification |
| regulatory_response | str | Regulatory action (investigation/fine/none) |
| remediation_cost_usd | float | Estimated remediation cost (USD) |
| public_disclosure_flag | int | Whether breach was publicly disclosed (0/1) |
| dpa_notification_flag | int | Whether DPA was notified (0/1) |
| breach_severity | str | Severity class (minor/moderate/major/critical) |
| affected_individuals_thousands | float | Affected individuals in thousands |
## Scenarios
- **baseline**: Current regulatory enforcement levels
- **enforcement_improved**: Strengthened DPA capacity, faster notification, higher fines
- **breach_surge**: Increased attack volume with weakened enforcement
## Regulatory Context
The dataset models enforcement dynamics of:
- **Nigeria**: Nigeria Data Protection and Privacy Act (NDPPA) / NDPR
- **Kenya**: Data Protection Act 2019 / Office of the Data Protection Commissioner (ODPC)
- **South Africa**: Protection of Personal Information Act (POPIA) / Information Regulator
- Other countries with emerging or absent data protection frameworks
## Usage
```python
import pandas as pd
baseline = pd.read_csv("data/baseline.csv")
enforcement = pd.read_csv("data/enforcement_improved.csv")
surge = pd.read_csv("data/breach_surge.csv")
```
## Generate & Validate
```bash
pip install -r requirements.txt
python generate_dataset.py
python validate_dataset.py
```
## License
CC-BY-4.0
提供机构:
electricsheepafrica



