electricsheepafrica/africa-gabiley-district-conflict-and-security-assessment-2015
收藏Hugging Face2026-04-11 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-gabiley-district-conflict-and-security-assessment-2015
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- n<1K
source_datasets:
- original
task_categories:
- tabular-classification
- other
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- som
pretty_name: "Gabiley District Conflict and Security Assessment - 2015"
dataset_info:
splits:
- name: train
num_examples: 110
- name: test
num_examples: 27
---
# Gabiley District Conflict and Security Assessment - 2015
**Publisher:** Observatory of Conflict and Violence Prevention (inactive) · **Source:** [HDX](https://data.humdata.org/dataset/gabiley-district-conflict-and-security-assessment-2015) · **License:** `cc-by-igo` · **Updated:** 2023-02-28
---
## Abstract
As part of its continual assessment of issues directly affecting community security and safety, OCVP conducted an extensive collection of primary data in the GEBILEY District of Somaliland.
Each row in this dataset represents subnational administrative unit observations. Data was last updated on HDX on 2023-02-28. Geographic scope: **SOM**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Public health |
| **Unit of observation** | Subnational administrative unit observations |
| **Rows (total)** | 138 |
| **Columns** | 149 (38 numeric, 111 categorical, 0 datetime) |
| **Train split** | 110 rows |
| **Test split** | 27 rows |
| **Geographic scope** | SOM |
| **Publisher** | Observatory of Conflict and Violence Prevention (inactive) |
| **HDX last updated** | 2023-02-28 |
---
## Variables
**Geographic** — `region_name` (range 1.0–1.0), `district_name` (range 1.0–1.0), `reporting_petty_crime` (range 1.0–777.0), `reporting_petty_other` ( ), `police_yearly_trend` (range 1.0–777.0) and 33 others.
**Demographic** — `village_name` (range 1.0–2.0), `gender_responder` (range 1.0–2.0), `age` (range 1.0–6.0), `legal_clinic_issuehhviolence`, `court_issuehhviolence` and 2 others.
**Outcome / Measurement** — `number_of_stations` (range 1.0–777.0), `number_of_stations_other` ( ), `number_of_courts` (range 1.0–2.0), `number_of_courts_other` ( ), `number_of_conflicts` and 2 others.
**Identifier / Metadata** — `legal_clinic_ref` ( , 1), `legal_clinic_ref_other` ( ), `legal_clinic_issuebusidisputes`, `court_ref`, `court_ref_other` and 11 others.
**Other** — `marital_status` (range 1.0–4.0), `level_education` (range 1.0–7.0), `police_presense` (range 1.0–2.0), `distance_to_station` (range 1.0–777.0), `reporting_civil` (range 1.0–777.0) and 76 others.
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-gabiley-district-conflict-and-security-assessment-2015")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `region_name` | int64 | 0.0% | 1.0 – 1.0 (mean 1.0) |
| `district_name` | int64 | 0.0% | 1.0 – 1.0 (mean 1.0) |
| `village_name` | int64 | 0.0% | 1.0 – 2.0 (mean 1.4855) |
| `gender_responder` | int64 | 0.0% | 1.0 – 2.0 (mean 1.471) |
| `age` | int64 | 0.0% | 1.0 – 6.0 (mean 2.9493) |
| `marital_status` | int64 | 0.0% | 1.0 – 4.0 (mean 1.7319) |
| `level_education` | int64 | 0.0% | 1.0 – 7.0 (mean 3.913) |
| `police_presense` | int64 | 0.0% | 1.0 – 2.0 (mean 1.0072) |
| `number_of_stations` | float64 | 0.7% | 1.0 – 777.0 (mean 12.4964) |
| `number_of_stations_other` | object | 0.0% | |
| `distance_to_station` | float64 | 0.7% | 1.0 – 777.0 (mean 13.0365) |
| `reporting_civil` | int64 | 0.0% | 1.0 – 777.0 (mean 9.913) |
| `reporting_civil_other` | object | 0.0% | |
| `reporting_petty_crime` | int64 | 0.0% | 1.0 – 777.0 (mean 10.2754) |
| `reporting_petty_other` | object | 0.0% | |
| `reporting_serious_crime` | int64 | 0.0% | 1.0 – 777.0 (mean 10.2971) |
| `reporting_serious_other` | object | 0.0% | |
| `trusted_sec_prov` | int64 | 0.0% | 2.0 – 777.0 (mean 10.3986) |
| `trusted_sec_other` | object | 0.0% | |
| `reason_for_choice_sec` | float64 | 1.4% | 1.0 – 5.0 (mean 1.4559) |
| `reason_for_choice_sec_other` | object | 0.0% | , Waxdhaama lamahayo, Iyaga amni ka masuul ah |
| `level_trust_police` | int64 | 0.0% | 1.0 – 777.0 (mean 20.4493) |
| `police_yearly_trend` | int64 | 0.0% | 1.0 – 777.0 (mean 18.1087) |
| `court_presense` | int64 | 0.0% | 1.0 – 777.0 (mean 6.6232) |
| `number_of_courts` | float64 | 0.7% | 1.0 – 2.0 (mean 1.0073) |
| `number_of_courts_other` | object | 0.0% | |
| `where_is_court` | float64 | 0.7% | 1.0 – 2.0 (mean 1.0073) |
| `distance_to_court` | float64 | 1.4% | |
| `legal_clinic_aware` | int64 | 0.0% | |
| `legal_clinic_use` | object | 0.0% | , 2, 1 |
| `legal_clinic_ref` | object | 0.0% | , 1 |
| `legal_clinic_ref_other` | object | 0.0% | |
| `legal_clinic_issuelanddispute` | object | 0.0% | |
| `legal_clinic_issuebusidisputes` | object | 0.0% | |
| `legal_clinic_issuerobbery` | object | 0.0% | |
| `legal_clinic_issueyouthviol` | object | 0.0% | |
| `legal_clinic_issuehhviolence` | object | 0.0% | |
| `legal_clinic_issueassault` | object | 0.0% | |
| `legal_clinic_issueother` | object | 0.0% | |
| `legal_clinic_issuerta` | object | 0.0% | |
| `legal_clinic_issue_other` | object | 0.0% | |
| `legal_clinic_judgement` | object | 0.0% | |
| `legal_clinic_enforced` | object | 0.0% | |
| `court_use` | int64 | 0.0% | |
| `court_ref` | object | 0.0% | |
| `court_ref_other` | object | 0.0% | |
| `court_issuelanddispute` | object | 0.0% | |
| `court_issuebusidisputes` | object | 0.0% | |
| `court_issuerobbery` | object | 0.0% | |
| `court_issueyouthviol` | object | 0.0% | |
| `court_issuehhviolence` | object | 0.0% | |
| `court_issueassault` | object | 0.0% | |
| `court_issueother` | object | 0.0% | |
| `court_issuerta` | object | 0.0% | |
| `court_issue_other` | object | 0.0% | |
| `court_judgement` | object | 0.0% | |
| `court_enforced` | object | 0.0% | |
| `elders_use` | int64 | 0.0% | |
| `elders_ref` | object | 0.0% | |
| `elders_ref_other` | object | 0.0% | |
| `elders_issuelanddispute` | object | 0.0% | |
| `elders_issuebusidisputes` | object | 0.0% | |
| `elders_issuerobbery` | object | 0.0% | |
| `elders_issueyouthviol` | object | 0.0% | |
| `elders_issuehhviolence` | object | 0.0% | |
| `elders_issueassault` | object | 0.0% | |
| `elders_issueother` | object | 0.0% | |
| `elders_issuerta` | object | 0.0% | |
| `elders_issue_other` | object | 0.0% | |
| `elders_judgement` | object | 0.0% | |
| `elders_enforced` | object | 0.0% | |
| `religious_use` | int64 | 0.0% | |
| `religious_ref` | object | 0.0% | |
| `religious_ref_other` | object | 0.0% | |
| `religious_issuelanddispute` | object | 0.0% | |
| `religious_issuebusidisputes` | object | 0.0% | |
| `religious_issuerobbery` | object | 0.0% | |
| `religious_issueyouthviol` | object | 0.0% | |
| `religious_issuehhviolence` | object | 0.0% | |
| `religious_issueassault` | object | 0.0% | |
| `religious_issueother` | object | 0.0% | |
| `religious_issuerta` | object | 0.0% | |
| `religious_issue_other` | object | 0.0% | |
| `religious_judgement` | object | 0.0% | |
| `religious_enforced` | object | 0.0% | |
| `trusted_just_prov` | int64 | 0.0% | |
| `trusted_just_prov_other` | object | 0.0% | |
| `reason_for_choice_just` | float64 | 2.2% | |
| `reason_for_choice_just_other` | object | 0.0% | |
| `conf_formal_just` | int64 | 0.0% | |
| `court_yearly_trend` | int64 | 0.0% | |
| `local_council_aware` | int64 | 0.0% | |
| `loc_gov_serviceseducation` | object | 0.0% | |
| `loc_gov_serviceshealth` | object | 0.0% | |
| `loc_gov_servicessecurity` | object | 0.0% | |
| `loc_gov_servicesjustice` | object | 0.0% | |
| `loc_gov_servicesagriculture` | object | 0.0% | |
| `loc_gov_servicesinfrastructure` | object | 0.0% | |
| `loc_gov_servicessanitation` | object | 0.0% | |
| `loc_gov_serviceswater` | object | 0.0% | |
| `loc_gov_servicesother` | object | 0.0% | |
| `loc_gov_servicesdontknow` | object | 0.0% | |
| `loc_gov_servicesrta` | object | 0.0% | |
| `loc_gov_services_other` | object | 0.0% | |
| `channels_comm` | float64 | 1.4% | |
| `consultation_participation` | float64 | 1.4% | |
| `participation_frequency` | object | 0.0% | |
| `participation_frequency_other` | object | 0.0% | |
| `elected_opinion` | int64 | 0.0% | |
| `community_issueslackofwater` | object | 0.0% | |
| `community_issuesdrought` | object | 0.0% | |
| `community_issueslofinfrastructure` | object | 0.0% | |
| `community_issuespoorsanitation` | object | 0.0% | |
| `community_issuespoorhealth` | object | 0.0% | |
| `community_issuesunemployment` | object | 0.0% | |
| `community_issuespooreducation` | object | 0.0% | |
| `community_issuesshortelectsupply` | object | 0.0% | |
| `community_issuespooreconomy` | object | 0.0% | |
| `community_issuescharcoalpdefor` | object | 0.0% | |
| `community_issuesbadhealthc` | object | 0.0% | |
| `community_issuesinsecurity` | object | 0.0% | |
| `community_issuesgenderbasedv` | object | 0.0% | |
| `community_issuesother` | object | 0.0% | |
| `community_issuesdontknow` | object | 0.0% | |
| `community_issuesrta` | object | 0.0% | |
| `community_issues_other` | object | 0.0% | |
| `council_yearly_trend` | float64 | 1.4% | |
| `witnessed_conflict` | int64 | 0.0% | |
| `number_of_conflicts` | object | 0.0% | |
| `number_conf_violence` | object | 0.0% | |
| `number_casualties` | object | 0.0% | |
| `conflict_reasonresources` | object | 0.0% | |
| `conflict_reasonfamilydisp` | object | 0.0% | |
| `conflict_reasoncrime` | object | 0.0% | |
| `conflict_reasonpower` | object | 0.0% | |
| `conflict_reasonrevenge` | object | 0.0% | |
| `conflict_reasonbusidisputes` | object | 0.0% | |
| `conflict_reasonrape` | object | 0.0% | |
| `conflict_reasonlackofjustice` | object | 0.0% | |
| `conflict_reasonyouthviol` | object | 0.0% | |
| `conflict_reasonother` | object | 0.0% | |
| `conflict_reasondontknow` | object | 0.0% | |
| `conflict_reasonrta` | object | 0.0% | |
| `conflict_reason_other` | object | 0.0% | |
| `witnessed_crimes` | int64 | 0.0% | |
| `how_safe` | int64 | 0.0% | |
| `safety_yearly_trend` | int64 | 0.0% | |
| `esa_source` | object | 0.0% | |
| `esa_processed` | object | 0.0% | |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `region_name` | 1.0 | 1.0 | 1.0 | 1.0 |
| `district_name` | 1.0 | 1.0 | 1.0 | 1.0 |
| `village_name` | 1.0 | 2.0 | 1.4855 | 1.0 |
| `gender_responder` | 1.0 | 2.0 | 1.471 | 1.0 |
| `age` | 1.0 | 6.0 | 2.9493 | 3.0 |
| `marital_status` | 1.0 | 4.0 | 1.7319 | 2.0 |
| `level_education` | 1.0 | 7.0 | 3.913 | 4.0 |
| `police_presense` | 1.0 | 2.0 | 1.0072 | 1.0 |
| `number_of_stations` | 1.0 | 777.0 | 12.4964 | 1.0 |
| `distance_to_station` | 1.0 | 777.0 | 13.0365 | 2.0 |
| `reporting_civil` | 1.0 | 777.0 | 9.913 | 5.0 |
| `reporting_petty_crime` | 1.0 | 777.0 | 10.2754 | 5.0 |
| `reporting_serious_crime` | 1.0 | 777.0 | 10.2971 | 5.0 |
| `trusted_sec_prov` | 2.0 | 777.0 | 10.3986 | 5.0 |
| `reason_for_choice_sec` | 1.0 | 5.0 | 1.4559 | 1.0 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 1 exact duplicate rows were removed. 10 column(s) were cast from string to numeric or datetime based on parse-success rate (>85% threshold). The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from Observatory of Conflict and Violence Prevention (inactive) and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/gabiley-district-conflict-and-security-assessment-2015) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_gabiley_district_conflict_and_security_assessment_2015,
title = {Gabiley District Conflict and Security Assessment - 2015},
author = {Observatory of Conflict and Violence Prevention (inactive)},
year = {2023},
url = {https://data.humdata.org/dataset/gabiley-district-conflict-and-security-assessment-2015},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica
搜集汇总
数据集介绍

构建方式
在冲突与安全评估领域,数据采集的严谨性至关重要。该数据集由冲突与暴力预防观察站通过实地调查构建,针对索马里兰加比莱地区社区安全议题展开系统性数据收集。原始数据经人道主义数据交换平台发布,后由Electric Sheep Africa团队进行标准化处理,包括统一缺失值标记、转换数据格式为Parquet,并移除重复记录,最终形成包含138条观测样本的机器学习就绪数据集。
使用方法
在机器学习应用场景中,该数据集可通过Hugging Face生态便捷加载。研究者使用datasets库调用指定路径即可获取经预处理的标准化数据,通过to_pandas()方法可转换为数据框进行探索性分析。数据集适用于分类与回归建模,可基于安全趋势预测、冲突因素识别等任务构建特征工程。需注意结合原始发布方的方法论说明,审慎处理数据中存在的采样偏差与定义不一致等潜在局限。
背景与挑战
背景概述
在冲突与安全研究领域,针对特定区域的微观层面评估对于理解社区动态至关重要。Gabiley District Conflict and Security Assessment - 2015数据集由现已停止活动的冲突与暴力预防观察站(OCVP)于2015年创建,旨在系统收集索马里兰加比莱地区的原始数据,以评估直接影响社区安全与稳定的核心问题。该数据集通过138条观测记录,涵盖地理、人口、司法机构存在、犯罪报告及冲突经历等多维度变量,为公共健康与安全领域的量化分析提供了珍贵的地方性实证基础。Electric Sheep Africa机构于2023年对其进行了机器学习友好型格式化处理,使其能够支持表格分类等任务,进而促进对脆弱地区安全治理的深入研究。
当前挑战
该数据集致力于解决冲突地区安全态势评估的复杂性问题,其核心挑战在于如何从有限的地方性样本中准确推断社区安全模式,并处理高度不平衡的类别分布与大量缺失值。构建过程中的挑战尤为显著,原始数据收集面临实地调研的困难,包括受访者可能因安全顾虑而提供不准确信息,以及变量定义在不同文化语境下的不一致性。此外,数据清洗阶段需统一多样化的缺失值标记,并将部分字符串列转换为数值类型,而自动化流程难以修正原始数据中潜在的误报或抽样偏差,这些因素均对模型的泛化能力与可靠性构成制约。
常用场景
经典使用场景
在冲突与安全研究领域,该数据集为分析索马里兰加比莱地区社区安全动态提供了微观层面的实证基础。研究者通常利用其丰富的变量,如警务存在、犯罪报告趋势、司法机构可及性等,构建分类或回归模型,以识别影响社区安全感知的关键因素。通过机器学习方法,能够揭示不同人口统计学特征与安全议题之间的复杂关联,为区域安全评估提供数据驱动的洞察。
解决学术问题
该数据集有效解决了冲突研究中对地方性安全机制量化不足的学术难题。它通过系统收集社区层面的安全感知、司法利用和冲突经历数据,使学者能够实证检验非正式司法机构与传统警务体系在冲突调解中的相对效能。其意义在于为脆弱地区的安全治理研究提供了稀缺的标准化数据,推动了基于证据的冲突预防理论发展,并填补了非洲次区域安全评估的实证空白。
实际应用
在实际应用层面,人道主义组织和地方治理机构可借助该数据集进行精准干预规划。通过分析社区报告的安全议题优先级和司法服务可及性,能够优化资源分配,例如在犯罪高发区域增强警力部署,或针对土地纠纷频发的村庄设立法律诊所。这些数据驱动的决策有助于提升社区安全项目的针对性和效率,支持索马里兰等地区的稳定与重建工作。
数据集最近研究
最新研究方向
在冲突分析与公共健康交叉领域,Gabiley District Conflict and Security Assessment - 2015数据集为研究索马里兰地区社区安全动态提供了珍贵子国家级观测数据。该数据集涵盖警务存在、司法可及性、冲突诱因及社区议题等多维变量,正被应用于机器学习模型以预测区域不稳定因素,并探索非正式司法机制与正式法律体系间的互动关系。随着人道主义数据科学兴起,此类精细化评估助力构建早期预警系统,为冲突预防及后冲突重建提供数据驱动洞见,尤其在资源匮乏环境中凸显其政策参考价值。
以上内容由遇见数据集搜集并总结生成



