electricsheepafrica/africa-aid-worker-security-database-sdn
收藏Hugging Face2026-04-11 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-aid-worker-security-database-sdn
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- n<1K
source_datasets:
- original
task_categories:
- tabular-classification
- other
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- aid-worker-security
- aid-workers
- conflict-violence
- sdn
pretty_name: "Sudan - Aid Worker Security Database"
dataset_info:
splits:
- name: train
num_examples: 362
- name: test
num_examples: 90
---
# Sudan - Aid Worker Security Database
**Publisher:** Humanitarian Outcomes · **Source:** [HDX](https://data.humdata.org/dataset/aid-worker-security-database-sdn) · **License:** `cc-by` · **Updated:** 2026-04-10
---
## Abstract
This dataset shows aid worker security incidents in Sudan. Annually, the data for the previous year undergoes a verification process. Data for the current year is provisional. For incident descriptions, please download data directly from [www.aidworkersecurity.org](www.aidworkersecurity.org)
Each row in this dataset represents discrete events or incidents. Data was last updated on HDX on 2026-04-10. Geographic scope: **SDN**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Conflict and security |
| **Unit of observation** | Discrete events or incidents |
| **Rows (total)** | 453 |
| **Columns** | 46 (30 numeric, 16 categorical, 0 datetime) |
| **Train split** | 362 rows |
| **Test split** | 90 rows |
| **Geographic scope** | SDN |
| **Publisher** | Humanitarian Outcomes |
| **HDX last updated** | 2026-04-10 |
---
## Variables
**Geographic** — `year` (range 1998.0–2026.0), `day` (range 1.0–31.0), `country_code` (SD), `country` (Sudan), `region` (South Darfur, Khartoum, North Darfur) and 7 others.
**Temporal** — `month` (range 1.0–12.0).
**Demographic** — `gender_male`, `gender_female`, `gender_unknown`.
**Outcome / Measurement** — `total_nationals` (range 0.0–18.0), `total_internationals` (range 0.0–3.0), `total_killed`, `total_wounded`, `total_kidnapped` and 2 others.
**Identifier / Metadata** — `incident_id` (range 44.0–5790.0), `nationals_kidnapped` (range 0.0–8.0), `internationals_kidnapped` (range 0.0–3.0), `actor_name`, `source` and 2 others.
**Other** — `un` (range 0.0–7.0), `ingo` (range 0.0–18.0), `icrc` (range 0.0–5.0), `nrcs_and_ifrc` (range 0.0–10.0), `nngo` (range 0.0–8.0) and 11 others.
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-aid-worker-security-database-sdn")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `incident_id` | int64 | 0.0% | 44.0 – 5790.0 (mean 2453.9735) |
| `year` | int64 | 0.0% | 1998.0 – 2026.0 (mean 2014.8344) |
| `month` | float64 | 0.7% | 1.0 – 12.0 (mean 6.3756) |
| `day` | float64 | 8.2% | 1.0 – 31.0 (mean 15.6202) |
| `country_code` | object | 0.0% | SD |
| `country` | object | 0.0% | Sudan |
| `region` | object | 11.3% | South Darfur, Khartoum, North Darfur |
| `district` | object | 21.9% | Khartoum, Al Fasher, Ag Geneina |
| `city` | object | 25.6% | Nyala, Khartoum, Al Fasher |
| `un` | int64 | 0.0% | 0.0 – 7.0 (mean 0.3996) |
| `ingo` | int64 | 0.0% | 0.0 – 18.0 (mean 0.8565) |
| `icrc` | int64 | 0.0% | 0.0 – 5.0 (mean 0.0375) |
| `nrcs_and_ifrc` | int64 | 0.0% | 0.0 – 10.0 (mean 0.1148) |
| `nngo` | int64 | 0.0% | 0.0 – 8.0 (mean 0.4084) |
| `other` | int64 | 0.0% | 0.0 – 2.0 (mean 0.011) |
| `nationals_killed` | int64 | 0.0% | 0.0 – 9.0 (mean 0.6071) |
| `nationals_wounded` | int64 | 0.0% | 0.0 – 18.0 (mean 0.6468) |
| `nationals_kidnapped` | int64 | 0.0% | 0.0 – 8.0 (mean 0.3996) |
| `nationals_detained` | int64 | 0.0% | 0.0 – 6.0 (mean 0.0419) |
| `total_nationals` | int64 | 0.0% | 0.0 – 18.0 (mean 1.6954) |
| `internationals_killed` | int64 | 0.0% | 0.0 – 1.0 (mean 0.0199) |
| `internationals_wounded` | int64 | 0.0% | 0.0 – 3.0 (mean 0.0419) |
| `internationals_kidnapped` | int64 | 0.0% | 0.0 – 3.0 (mean 0.0706) |
| `internationals_detained` | int64 | 0.0% | 0.0 – 0.0 (mean 0.0) |
| `total_internationals` | int64 | 0.0% | 0.0 – 3.0 (mean 0.1325) |
| `total_killed` | int64 | 0.0% | |
| `total_wounded` | int64 | 0.0% | |
| `total_kidnapped` | int64 | 0.0% | |
| `total_detained` | int64 | 0.0% | |
| `total_affected` | int64 | 0.0% | |
| `gender_male` | int64 | 0.0% | |
| `gender_female` | int64 | 0.0% | |
| `gender_unknown` | int64 | 0.0% | |
| `means_of_attack` | object | 0.0% | Shooting, Kidnapping, Bodily assault |
| `attack_context` | object | 0.0% | Ambush, Unknown, Raid |
| `location` | object | 0.0% | Road, Unknown, Project site |
| `latitude` | float64 | 0.0% | |
| `longitude` | float64 | 0.0% | |
| `motive` | object | 0.0% | Unknown, Incidental, Economic |
| `actor_type` | object | 0.0% | Unknown, Non-state armed group: National, Host state |
| `actor_name` | object | 0.0% | |
| `details` | object | 0.0% | |
| `verified` | object | 0.2% | |
| `source` | object | 0.0% | |
| `esa_source` | object | 0.0% | |
| `esa_processed` | object | 0.0% | |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `incident_id` | 44.0 | 5790.0 | 2453.9735 | 1775.0 |
| `year` | 1998.0 | 2026.0 | 2014.8344 | 2014.0 |
| `month` | 1.0 | 12.0 | 6.3756 | 6.0 |
| `day` | 1.0 | 31.0 | 15.6202 | 16.0 |
| `un` | 0.0 | 7.0 | 0.3996 | 0.0 |
| `ingo` | 0.0 | 18.0 | 0.8565 | 0.0 |
| `icrc` | 0.0 | 5.0 | 0.0375 | 0.0 |
| `nrcs_and_ifrc` | 0.0 | 10.0 | 0.1148 | 0.0 |
| `nngo` | 0.0 | 8.0 | 0.4084 | 0.0 |
| `other` | 0.0 | 2.0 | 0.011 | 0.0 |
| `nationals_killed` | 0.0 | 9.0 | 0.6071 | 0.0 |
| `nationals_wounded` | 0.0 | 18.0 | 0.6468 | 0.0 |
| `nationals_kidnapped` | 0.0 | 8.0 | 0.3996 | 0.0 |
| `nationals_detained` | 0.0 | 6.0 | 0.0419 | 0.0 |
| `total_nationals` | 0.0 | 18.0 | 1.6954 | 1.0 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from Humanitarian Outcomes and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- The following columns have >20% missing values and should be treated with caution in modelling: `district`, `city`.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/aid-worker-security-database-sdn) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_aid_worker_security_database_sdn,
title = {Sudan - Aid Worker Security Database},
author = {Humanitarian Outcomes},
year = {2026},
url = {https://data.humdata.org/dataset/aid-worker-security-database-sdn},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica
搜集汇总
数据集介绍

构建方式
在冲突与安全研究领域,构建高质量的事件数据库对于理解人道主义危机至关重要。该数据集由人道主义成果组织(Humanitarian Outcomes)系统收集并维护,专注于记录苏丹境内援助工作者遭遇的安全事件。原始数据通过年度核查流程确保准确性,并经由人道主义数据交换平台(HDX)公开发布。随后,Electric Sheep Africa团队通过CKAN API获取原始数据,执行了标准化清洗流程,包括统一缺失值标记、规范列名为蛇形命名法,并将数据转换为适合机器学习的Parquet格式,最后以固定随机种子按80/20比例划分为训练集与测试集。
特点
该数据集在结构上呈现出高度的多维性与完整性,共包含453条观测记录,每条记录对应一个独立的安全事件,涵盖46个变量。这些变量精细地刻画了事件的地理分布、时间序列、人口统计特征以及安全后果,例如具体到地区、城市的地理信息,以及伤亡、绑架等量化指标。数据集特别区分了本国与国际援助工作者的受影响情况,并包含了攻击手段、背景、动机等定性描述字段。尽管部分地理字段存在缺失,但核心安全指标完整度极高,为定量分析提供了坚实基础,其时间跨度从1998年至2026年,能够支持长周期的趋势研究。
使用方法
在机器学习与数据分析实践中,该数据集适用于分类、回归及模式发现等多种任务。研究人员可通过Hugging Face的`datasets`库便捷加载数据,利用提供的代码片段将数据转换为Pandas DataFrame以进行探索性分析。建模时应优先考虑完整性高的核心安全指标列,对于缺失率较高的地理字段需谨慎处理。数据集已预分为训练集与测试集,便于直接进行模型训练与评估。鉴于数据来源于实地报告,使用者需结合原始发布方的方法论说明,充分考虑数据收集过程中可能存在的报告偏差与定义不一致性,以确保分析结论的稳健性。
背景与挑战
背景概述
在冲突与安全研究领域,人道主义援助工作者的安全状况一直是国际社会关注的焦点。苏丹援助工作者安全数据库由Humanitarian Outcomes机构创建并维护,数据涵盖1998年至2026年期间发生在苏丹境内的援助工作者安全事件。该数据集以离散事件为观测单元,详细记录了袭击事件的地理分布、时间特征、人员伤亡及袭击者信息等46个维度的变量。通过系统化收集与整理冲突地区援助工作者遭受暴力侵害的案例,该数据库为量化分析人道主义行动风险提供了关键数据基础,对制定区域安全政策、优化援助资源配置具有重要参考价值。
当前挑战
该数据集致力于解决冲突地区援助工作者安全事件的模式识别与风险预测问题,其核心挑战在于事件数据的稀疏性与异质性。由于安全事件往往发生在信息受限的动荡环境中,数据收集面临报告不全、验证困难等障碍,导致部分字段缺失率较高。在构建过程中,原始数据存在定义不一致、数值误报及采样偏差等问题,自动化清洗流程难以完全纠正这些固有缺陷。此外,地理信息字段如地区与城市的缺失比例超过20%,为构建精确的地理空间模型带来显著困难,要求研究者采用稳健的统计方法以应对数据质量的不确定性。
常用场景
经典使用场景
在冲突与安全研究领域,该数据集为分析人道主义工作者在苏丹面临的安全威胁提供了结构化的事件记录。研究者通常利用其时间、地理和事件属性,通过统计建模或机器学习方法,识别袭击事件的时空分布模式、攻击手段的演变趋势以及不同行为主体的风险特征,从而揭示冲突动态对人道主义行动的影响机制。
解决学术问题
该数据集有效解决了冲突研究中关于非国家行为体暴力模式、人道主义干预安全成本以及脆弱地区袭击事件预测等关键学术问题。通过提供标准化的跨国事件数据,它支持了定量分析冲突烈度与援助工作者受害率之间的关联,促进了关于冲突环境下平民保护机制的理论构建,并为评估国际人道法的实施效果提供了实证基础。
衍生相关工作
基于该数据集衍生的经典研究包括利用时空点过程模型预测袭击热点区域,以及应用分类算法识别高风险攻击情境。相关学术成果进一步推动了冲突预警系统的开发,并启发了关于跨区域安全数据标准化整合的倡议,为人道主义信息学这一交叉学科的发展奠定了数据基础。
以上内容由遇见数据集搜集并总结生成



