electricsheepafrica/africa-south-sudan-attacks-on-civilians-and-vital-civilian-facilities
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-south-sudan-attacks-on-civilians-and-vital-civilian-facilities
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-sa-4.0
multilinguality:
- monolingual
size_categories:
- n<1K
source_datasets:
- original
task_categories:
- tabular-classification
- tabular-regression
- other
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- aid-worker-security
- aid-workers
- complex-emergency-conflict-security
- conflict-violence
- damage-assessment
- disease
- education
- education-facilities-schools
- ssd
pretty_name: "South Sudan (SSD): Attacks on Aid Operations, Education, Health Care, Food and Water Systems, and IDP/Refugee Camps, and Conflict-Related Sexual Violence Incident Data"
dataset_info:
splits:
- name: train
num_examples: 458
- name: test
num_examples: 114
---
# South Sudan (SSD): Attacks on Aid Operations, Education, Health Care, Food and Water Systems, and IDP/Refugee Camps, and Conflict-Related Sexual Violence Incident Data
**Publisher:** Insecurity Insight · **Source:** [HDX](https://data.humdata.org/dataset/south-sudan-attacks-on-civilians-and-vital-civilian-facilities) · **License:** `cc-by-sa` · **Updated:** 2026-04-06
---
## Abstract
This page contains information on reported incidents of violence and threats affecting aid operations and workers, education, food systems, health care services and refugee and IDP camps in [South Sudan](https://insecurityinsight.org/country-pages/south-sudan). They also provide information on incidents of conflict related sexual violence (CRSV). Also included are datasets cited in the [Safeguarding Health in Conflict Coalition (SHCC)'s](https://www.safeguardinghealth.org/) annual reports. Please get in touch if you are interested in curated datasets: info@insecurityinsight.org
Each row in this dataset represents discrete events or incidents. Temporal coverage is indicated by the `date`, `date_event_entered` column(s). Geographic scope: **SSD**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Food security and nutrition |
| **Unit of observation** | Discrete events or incidents |
| **Rows (total)** | 573 |
| **Columns** | 42 (26 numeric, 13 categorical, 3 datetime) |
| **Train split** | 458 rows |
| **Test split** | 114 rows |
| **Geographic scope** | SSD |
| **Publisher** | Insecurity Insight |
| **HDX last updated** | 2026-04-06 |
---
## Variables
**Geographic** — `country` (South Sudan), `country_iso` (SSD), `admin_1` (Central Equatoria, Jonglei, Unity), `location_of_incident` (Road, No information, Compound or Office Building), `aid_workers_killed_in_captivity` (range 0.0–3.0) and 4 others.
**Temporal** — `date`, `date_event_entered`, `date_event_modified`.
**Demographic** — `female_aid_workers_killed` (range 0.0–1.0), `male_aid_workers_killed` (range 0.0–3.0), `female_aid_workers_injured` (range 0.0–2.0), `male_aid_workers_injured` (range 0.0–11.0), `female_aid_workers_kidnapped` (range 0.0–4.0) and 3 others.
**Outcome / Measurement** — `organisation_affected` (INGO, UN Agency, LNGO).
**Identifier / Metadata** — `reported_perpetrator_name` (Unidentified armed actor, Criminal, South Sudan National Police Service), `aid_workers_killed` (range 0.0–7.0), `aid_workers_injured` (range 0.0–11.0), `aid_workers_kidnapped` (range 0.0–11.0), `aid_workers_arrested` (range 0.0–18.0) and 12 others.
**Other** — `geo_precision` (censored), `reported_perpetrator` (NSA, No Information, Criminal), `weapon_carried_used` (Firearms, No Information on the Weapon Used, Knife), `programme_focus` (No information, Health, Multiple).
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-south-sudan-attacks-on-civilians-and-vital-civilian-facilities")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `date` | datetime64[ns] | 0.0% | |
| `country` | object | 0.0% | South Sudan |
| `country_iso` | object | 0.0% | SSD |
| `admin_1` | object | 0.0% | Central Equatoria, Jonglei, Unity |
| `geo_precision` | object | 0.0% | censored |
| `location_of_incident` | object | 0.0% | Road, No information, Compound or Office Building |
| `reported_perpetrator` | object | 0.0% | NSA, No Information, Criminal |
| `reported_perpetrator_name` | object | 0.0% | Unidentified armed actor, Criminal, South Sudan National Police Service |
| `weapon_carried_used` | object | 0.0% | Firearms, No Information on the Weapon Used, Knife |
| `organisation_affected` | object | 0.0% | INGO, UN Agency, LNGO |
| `programme_focus` | object | 0.0% | No information, Health, Multiple |
| `aid_workers_killed` | int64 | 0.0% | 0.0 – 7.0 (mean 0.4695) |
| `aid_workers_injured` | int64 | 0.0% | 0.0 – 11.0 (mean 0.8028) |
| `aid_workers_kidnapped` | int64 | 0.0% | 0.0 – 11.0 (mean 0.3333) |
| `aid_workers_arrested` | int64 | 0.0% | 0.0 – 18.0 (mean 0.2548) |
| `known_kidnapping_or_arrest_outcome` | object | 78.0% | |
| `aid_workers_killed_in_captivity` | int64 | 0.0% | 0.0 – 3.0 (mean 0.0227) |
| `international_aid_workers_killed` | int64 | 0.0% | 0.0 – 3.0 (mean 0.0401) |
| `international_aid_workers_killed_in_captivity` | int64 | 0.0% | 0.0 – 1.0 (mean 0.0017) |
| `national_aid_workers_killed` | int64 | 0.0% | 0.0 – 7.0 (mean 0.3944) |
| `national_aid_workers_killed_in_captivity` | int64 | 0.0% | 0.0 – 3.0 (mean 0.0192) |
| `female_aid_workers_killed` | int64 | 0.0% | 0.0 – 1.0 (mean 0.0105) |
| `female_aid_workers_killed_in_captivity` | int64 | 0.0% | 0.0 – 0.0 (mean 0.0) |
| `male_aid_workers_killed` | int64 | 0.0% | 0.0 – 3.0 (mean 0.2373) |
| `male_aid_workers_killed_in_captivity` | int64 | 0.0% | 0.0 – 3.0 (mean 0.0192) |
| `international_aid_workers_injured` | int64 | 0.0% | 0.0 – 10.0 (mean 0.0716) |
| `national_aid_workers_injured` | int64 | 0.0% | 0.0 – 11.0 (mean 0.6178) |
| `female_aid_workers_injured` | int64 | 0.0% | 0.0 – 2.0 (mean 0.0489) |
| `male_aid_workers_injured` | int64 | 0.0% | 0.0 – 11.0 (mean 0.377) |
| `international_aid_workers_kidnapped` | int64 | 0.0% | 0.0 – 4.0 (mean 0.0279) |
| `national_aid_workers_kidnapped` | int64 | 0.0% | 0.0 – 10.0 (mean 0.281) |
| `female_aid_workers_kidnapped` | int64 | 0.0% | 0.0 – 4.0 (mean 0.0122) |
| `male_aid_workers_kidnapped` | int64 | 0.0% | |
| `international_aid_workers_arrested` | int64 | 0.0% | |
| `national_aid_workers_arrested` | int64 | 0.0% | |
| `female_aid_workers_arrested` | int64 | 0.0% | |
| `male_aid_workers_arrested` | int64 | 0.0% | |
| `sind_event_id` | int64 | 0.0% | |
| `date_event_entered` | datetime64[ns] | 0.0% | |
| `date_event_modified` | datetime64[ns] | 0.0% | |
| `esa_source` | object | 0.0% | |
| `esa_processed` | object | 0.0% | |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `aid_workers_killed` | 0.0 | 7.0 | 0.4695 | 0.0 |
| `aid_workers_injured` | 0.0 | 11.0 | 0.8028 | 1.0 |
| `aid_workers_kidnapped` | 0.0 | 11.0 | 0.3333 | 0.0 |
| `aid_workers_arrested` | 0.0 | 18.0 | 0.2548 | 0.0 |
| `aid_workers_killed_in_captivity` | 0.0 | 3.0 | 0.0227 | 0.0 |
| `international_aid_workers_killed` | 0.0 | 3.0 | 0.0401 | 0.0 |
| `international_aid_workers_killed_in_captivity` | 0.0 | 1.0 | 0.0017 | 0.0 |
| `national_aid_workers_killed` | 0.0 | 7.0 | 0.3944 | 0.0 |
| `national_aid_workers_killed_in_captivity` | 0.0 | 3.0 | 0.0192 | 0.0 |
| `female_aid_workers_killed` | 0.0 | 1.0 | 0.0105 | 0.0 |
| `female_aid_workers_killed_in_captivity` | 0.0 | 0.0 | 0.0 | 0.0 |
| `male_aid_workers_killed` | 0.0 | 3.0 | 0.2373 | 0.0 |
| `male_aid_workers_killed_in_captivity` | 0.0 | 3.0 | 0.0192 | 0.0 |
| `international_aid_workers_injured` | 0.0 | 10.0 | 0.0716 | 0.0 |
| `national_aid_workers_injured` | 0.0 | 11.0 | 0.6178 | 0.0 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 3 column(s) with >80% missing values were removed: `event_description`, `latitude`, `longitude`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from Insecurity Insight and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- The following columns have >20% missing values and should be treated with caution in modelling: `known_kidnapping_or_arrest_outcome`.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/south-sudan-attacks-on-civilians-and-vital-civilian-facilities) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_south_sudan_attacks_on_civilians_and_vital_civilian_facilities,
title = {South Sudan (SSD): Attacks on Aid Operations, Education, Health Care, Food and Water Systems, and IDP/Refugee Camps, and Conflict-Related Sexual Violence Incident Data},
author = {Insecurity Insight},
year = {2026},
url = {https://data.humdata.org/dataset/south-sudan-attacks-on-civilians-and-vital-civilian-facilities},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica
搜集汇总
数据集介绍

构建方式
在冲突与人道主义研究领域,数据集的构建往往依赖于权威机构对现实事件的系统化记录。本数据集由Insecurity Insight基于公开报告整理而成,原始数据来源于人道主义数据交换平台(HDX),涵盖了南苏丹境内针对援助行动、教育、医疗、粮食与供水系统以及流离失所者营地的袭击事件,同时包含冲突相关性暴力事件。数据采集过程遵循事件报告的标准流程,每条记录对应一个独立事件,并通过时间、地理、涉事方等多维度字段进行结构化描述。随后,Electric Sheep Africa团队对原始数据进行了机器学习的适应性处理,包括字段标准化、缺失值统一清理以及高缺失率字段的移除,最终将数据转换为Parquet格式并划分为训练集与测试集,为后续分析提供了可直接使用的数据基础。
特点
该数据集在人道主义安全监测领域展现出鲜明的结构化特征。其核心在于以事件为观测单元,共包含573条记录,每条记录通过42个变量细致刻画了事件的时空分布、涉事人员伤亡情况、施害者属性以及受影响组织的类型。数据字段涵盖地理、时间、人口统计、结果测量及元数据等多个类别,其中26个数值型变量和13个分类型变量为量化分析提供了丰富维度。值得注意的是,数据集对援助工作者伤亡情况进行了细致的性别与国际身份区分,这为研究冲突中特定群体的脆弱性提供了精细视角。同时,数据集保持了较高的数据完整性,多数关键字段缺失率较低,且已预先划分为训练集与测试集,便于模型开发与评估。
使用方法
对于致力于冲突分析与人道主义响应的研究者而言,该数据集为机器学习模型的训练与验证提供了直接入口。用户可通过Hugging Face的datasets库便捷加载数据,利用Python环境将数据转换为Pandas DataFrame进行探索性分析。数据集适用于多种监督学习任务,例如基于地理、时间与事件特征的分类模型,可用于预测袭击事件的类型或严重程度;回归模型则可尝试估计伤亡人数等连续变量。在建模前,建议重点关注字段间的相关性,并对存在较高缺失率的变量(如已知绑架或逮捕结果)进行审慎处理。数据集的时空属性也支持时间序列分析与地理可视化,有助于揭示袭击事件的动态模式与空间聚集特征。
背景与挑战
背景概述
在复杂紧急状况与人道主义危机研究的学术脉络中,对冲突地区平民及关键民用设施遭受攻击的系统性记录与分析,构成了评估干预效果与保护脆弱群体的实证基础。由Insecurity Insight机构创建并于2026年发布的“南苏丹攻击事件数据集”,聚焦于南苏丹境内针对援助行动、教育、医疗、粮食与水系统以及流离失所者营地的暴力事件,同时涵盖冲突相关性暴力事件。该数据集的核心研究问题在于量化与表征冲突对民生关键领域的冲击,旨在为人道主义响应、政策制定及学术研究提供结构化的事件级数据支持,其发布深化了我们对冲突动态及其人道后果的理解。
当前挑战
该数据集致力于解决冲突分析与人道主义保护领域的关键问题,即如何从零散的事件报告中构建可量化、可分析的标准化数据,以支持趋势预测、风险建模及资源分配决策。其面临的领域挑战包括事件报告的时空不完整性、 perpetrator 信息的模糊性,以及跨不同民生领域(如医疗、教育)事件严重性度量的标准化难题。在构建过程中,挑战主要源于原始数据的异构性,例如高缺失值字段(如已知绑架或逮捕结果)的处理、地理坐标的普遍缺失导致的精确空间分析受限,以及将非结构化报告转化为机器学习就绪的表格数据时,对事件分类与变量定义的统一所面临的语义一致性难题。
常用场景
经典使用场景
在冲突与人道主义研究领域,该数据集为分析针对平民及关键民用设施的暴力事件提供了结构化数据基础。其经典使用场景聚焦于通过机器学习模型预测南苏丹地区援助行动、教育、医疗、粮食与水系统以及流离失所者营地遭受攻击的风险。研究者常利用其时空、地理及人口统计变量,构建分类或回归模型,以识别高威胁区域、评估事件严重性,并为冲突动态的量化研究提供实证支持。
衍生相关工作
围绕该数据集,已衍生出多项经典研究工作,包括利用其构建的冲突事件预测模型、援助工作者安全态势评估框架,以及基于地理信息的攻击热点探测算法。相关研究进一步推动了人道主义数据科学领域的发展,例如将事件数据与卫星影像、社交媒体信息融合,以提升冲突监测的实时性与准确性。这些工作不仅拓展了数据在机器学习中的应用边界,也为政策制定提供了更精细化的证据基础。
数据集最近研究
最新研究方向
在复杂紧急人道主义背景下,针对南苏丹平民及关键民用设施袭击事件的数据集,正成为冲突分析与预测建模领域的前沿焦点。研究者们借助机器学习技术,深入挖掘袭击事件的时空分布规律与施害者行为模式,旨在构建风险预警系统以保障人道援助行动的安全。该数据集与全球关注的热点事件紧密相连,如冲突中性暴力事件的追踪与干预,以及脆弱地区粮食系统与医疗服务的保护策略。其应用不仅推动了人道主义行动的数据驱动决策,也为国际组织制定精准的冲突缓解方案提供了实证基础,具有深远的现实意义。
以上内容由遇见数据集搜集并总结生成



