electricsheepafrica/africa-erbera-district-conflict-and-security-assessment-2015
收藏Hugging Face2026-04-11 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-erbera-district-conflict-and-security-assessment-2015
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- n<1K
source_datasets:
- original
task_categories:
- tabular-classification
- tabular-regression
- other
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- complex-emergency-conflict-security
- conflict-violence
- som
pretty_name: "Berbera District Conflict and Security Assessment - 2015"
dataset_info:
splits:
- name: train
num_examples: 160
- name: test
num_examples: 40
---
# Berbera District Conflict and Security Assessment - 2015
**Publisher:** Observatory of Conflict and Violence Prevention (inactive) · **Source:** [HDX](https://data.humdata.org/dataset/erbera-district-conflict-and-security-assessment-2015) · **License:** `cc-by-igo` · **Updated:** 2023-03-03
---
## Abstract
As part of its continual assessment of issues directly affecting community security and safety, OCVP conducted an extensive collection of primary data in the BERBERA District- the regional administration of the Sahil region of Somaliland.
Further details @ http://www.ocvp.org/ocvp5/index.php/publications/dcsa/51-berbera-district-conflict-and-security-assessment-report-2015
Each row in this dataset represents subnational administrative unit observations. Data was last updated on HDX on 2023-03-03. Geographic scope: **SOM**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Public health |
| **Unit of observation** | Subnational administrative unit observations |
| **Rows (total)** | 200 |
| **Columns** | 123 (39 numeric, 84 categorical, 0 datetime) |
| **Train split** | 160 rows |
| **Test split** | 40 rows |
| **Geographic scope** | SOM |
| **Publisher** | Observatory of Conflict and Violence Prevention (inactive) |
| **HDX last updated** | 2023-03-03 |
---
## Variables
**Geographic** — `region_name` (range 1.0–1.0), `district_name` (range 1.0–1.0), `reporting_petty_crime` (range 1.0–5.0), `reporting_petty_other` ( ), `police_yearly_trend` (range 1.0–777.0) and 24 others.
**Demographic** — `village_name` (range 1.0–4.0), `gender_responder` (range 1.0–2.0), `age` (range 1.0–6.0).
**Outcome / Measurement** — `number_of_stations` (range 1.0–5.0), `number_of_stations_other` ( ), `number_of_courts` (range 1.0–777.0), `number_of_courts_other` ( ), `number_of_conflicts` and 2 others.
**Identifier / Metadata** — `legal_clinic_ref` ( ), `legal_clinic_ref_other` ( ), `court_ref`, `court_ref_other`, `elders_ref` and 8 others.
**Other** — `marital_status` (range 1.0–4.0), `level_education` (range 1.0–7.0), `police_presense` (range 1.0–2.0), `distance_to_station` (range 1.0–2.0), `reporting_civil` (range 1.0–6.0) and 66 others.
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-erbera-district-conflict-and-security-assessment-2015")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `region_name` | int64 | 0.0% | 1.0 – 1.0 (mean 1.0) |
| `district_name` | int64 | 0.0% | 1.0 – 1.0 (mean 1.0) |
| `village_name` | int64 | 0.0% | 1.0 – 4.0 (mean 2.1) |
| `gender_responder` | int64 | 0.0% | 1.0 – 2.0 (mean 1.44) |
| `age` | int64 | 0.0% | 1.0 – 6.0 (mean 3.01) |
| `marital_status` | int64 | 0.0% | 1.0 – 4.0 (mean 1.81) |
| `level_education` | int64 | 0.0% | 1.0 – 7.0 (mean 3.84) |
| `police_presense` | int64 | 0.0% | 1.0 – 2.0 (mean 1.07) |
| `number_of_stations` | float64 | 7.0% | 1.0 – 5.0 (mean 1.8871) |
| `number_of_stations_other` | object | 0.0% | |
| `distance_to_station` | float64 | 7.0% | 1.0 – 2.0 (mean 1.0269) |
| `reporting_civil` | int64 | 0.0% | 1.0 – 6.0 (mean 4.6) |
| `reporting_civil_other` | object | 0.0% | , Mayarka |
| `reporting_petty_crime` | int64 | 0.0% | 1.0 – 5.0 (mean 4.925) |
| `reporting_petty_other` | object | 0.0% | |
| `reporting_serious_crime` | int64 | 0.0% | 1.0 – 6.0 (mean 4.955) |
| `reporting_serious_other` | object | 0.0% | , Kuf |
| `trusted_sec_prov` | int64 | 0.0% | 1.0 – 6.0 (mean 4.76) |
| `trusted_sec_other` | object | 0.0% | , Non |
| `reason_for_choice_sec` | float64 | 0.5% | 1.0 – 5.0 (mean 1.6683) |
| `reason_for_choice_sec_other` | object | 0.0% | , Amaan buuxa ayaan ku helaynaa, Waa amnigii qaranka |
| `level_trust_police` | int64 | 0.0% | 1.0 – 4.0 (mean 3.26) |
| `police_yearly_trend` | int64 | 0.0% | 1.0 – 777.0 (mean 24.86) |
| `court_presense` | int64 | 0.0% | 1.0 – 2.0 (mean 1.025) |
| `number_of_courts` | float64 | 2.5% | 1.0 – 777.0 (mean 9.4923) |
| `number_of_courts_other` | object | 0.0% | |
| `where_is_court` | float64 | 2.5% | 1.0 – 777.0 (mean 5.0) |
| `distance_to_court` | float64 | 5.0% | |
| `legal_clinic_aware` | int64 | 0.0% | |
| `legal_clinic_use` | object | 0.0% | , 2 |
| `legal_clinic_ref` | object | 0.0% | |
| `legal_clinic_ref_other` | object | 0.0% | |
| `legal_clinic_issue` | object | 0.0% | |
| `legal_clinic_issue_other` | object | 0.0% | |
| `legal_clinic_judgement` | object | 0.0% | |
| `legal_clinic_enforced` | object | 0.0% | |
| `court_use` | int64 | 0.0% | |
| `court_ref` | object | 0.0% | |
| `court_ref_other` | object | 0.0% | |
| `court_issue` | object | 0.0% | |
| `court_issue_other` | object | 0.0% | |
| `court_judgement` | object | 0.0% | |
| `court_enforced` | object | 0.0% | |
| `elders_use` | int64 | 0.0% | |
| `elders_ref` | object | 0.0% | |
| `elders_ref_other` | object | 0.0% | |
| `elders_issue` | object | 0.0% | |
| `elders_issue_other` | object | 0.0% | |
| `elders_judgement` | object | 0.0% | |
| `elders_enforced` | object | 0.0% | |
| `religious_use` | int64 | 0.0% | |
| `religious_ref` | object | 0.0% | |
| `religious_ref_other` | object | 0.0% | |
| `religious_issue` | object | 0.0% | |
| `religious_issue_other` | object | 0.0% | |
| `religious_judgement` | object | 0.0% | |
| `religious_enforced` | object | 0.0% | |
| `trusted_just_prov` | int64 | 0.0% | |
| `trusted_just_prov_other` | object | 0.0% | |
| `reason_for_choice_just` | float64 | 0.5% | |
| `reason_for_choice_just_other` | object | 0.0% | |
| `conf_formal_just` | int64 | 0.0% | |
| `court_yearly_trend` | int64 | 0.0% | |
| `local_council_aware` | int64 | 0.0% | |
| `aware_of_services` | float64 | 4.0% | |
| `channels_comm` | float64 | 4.0% | |
| `consultation_participation` | object | 0.0% | |
| `participation_frequency` | object | 0.0% | |
| `participation_frequency_other` | object | 0.0% | |
| `elected_opinion` | int64 | 0.0% | |
| `loc_gov_serviceseducation` | object | 0.0% | |
| `loc_gov_serviceshealth` | object | 0.0% | |
| `loc_gov_servicessecurity` | object | 0.0% | |
| `loc_gov_servicesjustice` | object | 0.0% | |
| `loc_gov_servicesagriculture` | object | 0.0% | |
| `loc_gov_servicesinfrastructure` | object | 0.0% | |
| `loc_gov_servicessanitation` | object | 0.0% | |
| `loc_gov_serviceswater` | object | 0.0% | |
| `loc_gov_servicesother` | object | 0.0% | |
| `loc_gov_servicesdont_know` | object | 0.0% | |
| `loc_gov_servicesrefused_to_answer` | object | 0.0% | |
| `loc_gov_services_other` | object | 0.0% | |
| `community_issueslack_of_water` | object | 0.0% | |
| `community_issuesdrought` | object | 0.0% | |
| `community_issueslack_of_infrastructure` | object | 0.0% | |
| `community_issuespoor_sanitation` | object | 0.0% | |
| `community_issuespoor_health` | object | 0.0% | |
| `community_issuesunemployment` | object | 0.0% | |
| `community_issuespoor_education` | object | 0.0% | |
| `community_issuesshortage_of_electicity_supply` | object | 0.0% | |
| `community_issuespoor_economy` | object | 0.0% | |
| `community_issuescharcoal_production_deforestation` | object | 0.0% | |
| `community_issuesbad_health_centers` | object | 0.0% | |
| `community_issuesinsecurity` | object | 0.0% | |
| `community_issuesgender_based_violence` | object | 0.0% | |
| `community_issuesother` | object | 0.0% | |
| `community_issuesdont_know` | object | 0.0% | |
| `community_issuesrefused_to_answer` | object | 0.0% | |
| `community_issues_other` | object | 0.0% | |
| `council_yearly_trend` | float64 | 4.5% | |
| `witnessed_conflict` | float64 | 0.5% | |
| `number_of_conflicts` | object | 0.0% | |
| `number_conf_violence` | object | 0.0% | |
| `number_casualties` | object | 0.0% | |
| `conflict_reasonresources` | object | 0.0% | |
| `conflict_reasonfamily_disputes` | object | 0.0% | |
| `conflict_reasoncrime` | object | 0.0% | |
| `conflict_reasonpower` | object | 0.0% | |
| `conflict_reasonrevenge` | object | 0.0% | |
| `conflict_reasonbusiness_disputes` | object | 0.0% | |
| `conflict_reasonrape` | object | 0.0% | |
| `conflict_reasonlack_of_justice` | object | 0.0% | |
| `conflict_reasonother` | object | 0.0% | |
| `conflict_reasondont_know` | object | 0.0% | |
| `conflict_reasonrefused_to_answer` | object | 0.0% | |
| `conflict_reason_other` | object | 0.0% | |
| `witnessed_crimes` | float64 | 0.5% | |
| `how_safe` | float64 | 0.5% | |
| `safety_yearly_trend` | float64 | 0.5% | |
| `nspc` | float64 | 8.0% | |
| `njpc` | object | 0.0% | |
| `esa_source` | object | 0.0% | |
| `esa_processed` | object | 0.0% | |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `region_name` | 1.0 | 1.0 | 1.0 | 1.0 |
| `district_name` | 1.0 | 1.0 | 1.0 | 1.0 |
| `village_name` | 1.0 | 4.0 | 2.1 | 2.0 |
| `gender_responder` | 1.0 | 2.0 | 1.44 | 1.0 |
| `age` | 1.0 | 6.0 | 3.01 | 3.0 |
| `marital_status` | 1.0 | 4.0 | 1.81 | 2.0 |
| `level_education` | 1.0 | 7.0 | 3.84 | 4.0 |
| `police_presense` | 1.0 | 2.0 | 1.07 | 1.0 |
| `number_of_stations` | 1.0 | 5.0 | 1.8871 | 2.0 |
| `distance_to_station` | 1.0 | 2.0 | 1.0269 | 1.0 |
| `reporting_civil` | 1.0 | 6.0 | 4.6 | 5.0 |
| `reporting_petty_crime` | 1.0 | 5.0 | 4.925 | 5.0 |
| `reporting_serious_crime` | 1.0 | 6.0 | 4.955 | 5.0 |
| `trusted_sec_prov` | 1.0 | 6.0 | 4.76 | 5.0 |
| `reason_for_choice_sec` | 1.0 | 5.0 | 1.6683 | 1.0 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 15 column(s) were cast from string to numeric or datetime based on parse-success rate (>85% threshold). The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from Observatory of Conflict and Violence Prevention (inactive) and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/erbera-district-conflict-and-security-assessment-2015) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_erbera_district_conflict_and_security_assessment_2015,
title = {Berbera District Conflict and Security Assessment - 2015},
author = {Observatory of Conflict and Violence Prevention (inactive)},
year = {2023},
url = {https://data.humdata.org/dataset/erbera-district-conflict-and-security-assessment-2015},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica
搜集汇总
数据集介绍

构建方式
在冲突与安全评估领域,该数据集源于非洲索马里兰地区Berbera区域的实地调研。由冲突与暴力预防观察站通过系统性的基层数据采集构建而成,涵盖地理、人口、司法、安全等多维度变量,共计200条观测记录,每条记录代表一个次国家级行政单元。原始数据经由人道主义数据交换平台发布,并由Electric Sheep Africa团队进行标准化清洗与格式转换,统一缺失值标记并优化数据类型,最终划分为160条训练样本与40条测试样本,以Parquet格式存储供机器学习使用。
特点
该数据集在公共健康与安全研究领域展现出鲜明的结构化特征。其包含123个变量,涵盖地理标识、人口属性、安全感知、司法可及性及社区议题等多个主题,其中39列为数值型,84列为分类型,全面刻画了区域安全生态。数据以英文呈现,规模精炼但信息密度高,每条观测均关联具体村落层级的反馈,如警力存在、法院使用、冲突经历等指标,为微观层面的安全动力学分析提供了实证基础。数据集经过系统化预处理,确保了机器学习的直接可用性。
使用方法
针对机器学习应用,该数据集适用于表格分类与回归任务,如安全态势预测或司法服务可及性建模。使用者可通过Hugging Face的datasets库直接加载,利用Python环境快速导入训练集与测试集,并转换为Pandas DataFrame进行探索性分析。鉴于其多变量混合类型的特点,建议在建模前进行特征工程,处理分类变量编码与缺失值插补。研究人员可依据地理或人口变量进行分层分析,或结合领域知识构建复合指标,以深化对冲突与安全驱动机制的理解。
背景与挑战
背景概述
在冲突与安全研究领域,对特定区域进行系统性评估是理解社区动态、制定有效干预措施的基础。Berbera District Conflict and Security Assessment - 2015数据集由现已停止运作的冲突与暴力预防观察站于2015年创建,旨在通过收集索马里兰萨希尔地区柏培拉行政区的一手数据,深入剖析直接影响社区安全与稳定的核心问题。该数据集涵盖了地理、人口、司法服务、冲突事件等多维度变量,为公共健康与安全领域的量化研究提供了珍贵的地方性实证材料。其发布通过人道主义数据交换平台,并由Electric Sheep Africa机构进行机器学习适配化处理,促进了数据在计算社会科学中的可及性与应用潜力。
当前挑战
该数据集致力于解决冲突地区安全态势评估与预测的复杂问题,其核心挑战在于如何从高维度、小样本的社区调查数据中,有效提取影响安全的关键因素并构建稳健的预测模型。数据构建过程中面临多重困难:原始数据采集于动荡环境,可能存在报告偏差与定义不一致;样本规模有限,仅包含200条观测记录,制约了统计推断的可靠性;变量涵盖84个分类特征与39个数值特征,其中存在大量缺失值与非常规编码,如‘777’等特殊值,增加了数据清洗与特征工程的复杂度。此外,数据发布机构已停止运作,独立验证与后续更新难以实现,对数据的时效性与完整性构成持续挑战。
常用场景
经典使用场景
在冲突与安全研究领域,该数据集为学者提供了深入分析索马里兰柏培拉地区社区安全动态的实证基础。通过包含地理、人口统计、警务存在、司法可及性及冲突事件等多维度变量,研究人员能够构建统计模型,探索安全感知、犯罪报告行为与制度信任之间的复杂关联。经典应用场景涉及利用回归分析或分类算法,预测社区安全趋势或识别影响居民安全感知的关键因素,从而揭示脆弱环境中安全治理的微观机制。
解决学术问题
该数据集有效解决了发展研究与和平构建领域中关于地方性安全评估数据稀缺的学术难题。它使研究者能够实证检验冲突后社会的安全感知形成机制、司法可及性对冲突解决的影响,以及非正式治理机构(如长老会)在安全供给中的作用。通过提供细粒度的次国家级观测数据,该资源支持对脆弱国家社区韧性理论的验证,促进了关于安全部门改革与地方治理交互作用的学术对话,为理解复杂紧急状态下的人类安全提供了关键经验证据。
衍生相关工作
围绕该数据集衍生的经典工作,主要集中在利用机器学习方法对冲突风险进行预测建模,以及比较不同冲突背景下社区安全决定因素的跨国研究。例如,学者可能将其与非洲其他地区的安全评估数据集进行整合,以构建泛非安全指数或训练能够早期预警社区暴力事件的算法。此外,该数据集常被引用于关于数据驱动型人道主义响应的学术论述中,作为展示原始调查数据如何转化为可用于监督学习任务的格式化特征的典型案例,推动了人道数据科学这一交叉学科的发展。
以上内容由遇见数据集搜集并总结生成



