five

electricsheepafrica/africa-galkayo-district-conflict-and-security-assessment-2015

收藏
Hugging Face2026-04-11 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-galkayo-district-conflict-and-security-assessment-2015
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - no-annotation language_creators: - found language: - en license: cc-by-4.0 multilinguality: - monolingual size_categories: - n<1K source_datasets: - original task_categories: - tabular-classification - tabular-regression - other task_ids: [] tags: - africa - humanitarian - hdx - electric-sheep-africa - som pretty_name: "Galkayo District Conflict and Security Assessment - 2015" dataset_info: splits: - name: train num_examples: 157 - name: test num_examples: 39 --- # Galkayo District Conflict and Security Assessment - 2015 **Publisher:** Observatory of Conflict and Violence Prevention (inactive) · **Source:** [HDX](https://data.humdata.org/dataset/galkayo-district-conflict-and-security-assessment-2015) · **License:** `cc-by-igo` · **Updated:** 2023-02-28 --- ## Abstract As part of its continual assessment of issues directly affecting community security and safety, OCVP conducted an extensive collection of primary data in GALKAYO District - the capital of the central Mudug region of Somalia. Further details @ http://www.ocvp.org/ocvp5/index.php/publications/dcsa/52-galkayo-district-conflict-and-security-assessment-report-2015 Each row in this dataset represents subnational administrative unit observations. Data was last updated on HDX on 2023-02-28. Geographic scope: **SOM**. *Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).* --- ## Dataset Characteristics | | | |---|---| | **Domain** | Public health | | **Unit of observation** | Subnational administrative unit observations | | **Rows (total)** | 197 | | **Columns** | 122 (31 numeric, 91 categorical, 0 datetime) | | **Train split** | 157 rows | | **Test split** | 39 rows | | **Geographic scope** | SOM | | **Publisher** | Observatory of Conflict and Violence Prevention (inactive) | | **HDX last updated** | 2023-02-28 | --- ## Variables **Geographic** — `region_name` (range 1.0–1.0), `district_name` (range 1.0–1.0), `reporting_petty_crime` (range 1.0–888.0), `reporting_petty_other` ( , Dadka dhexdiida, Grt urur l dhaho), `police_yearly_trend` (range 1.0–888.0) and 24 others. **Demographic** — `village_name` (range 1.0–5.0), `gender_responder` (range 1.0–2.0), `age` (range 1.0–6.0). **Outcome / Measurement** — `number_of_stations` ( , 1, 2), `number_of_stations_other` ( ), `number_of_courts` ( , 1, 777), `number_of_courts_other` ( ), `number_of_conflicts` and 2 others. **Identifier / Metadata** — `legal_clinic_ref`, `legal_clinic_ref_other`, `court_ref`, `court_ref_other`, `elders_ref` and 8 others. **Other** — `serial` (range 1.0–206.0), `marital_status` (range 1.0–888.0), `level_education` (range 1.0–7.0), `police_presense` (range 1.0–888.0), `distance_to_station` ( , 1, 2) and 65 others. --- ## Quick Start ```python from datasets import load_dataset ds = load_dataset("electricsheepafrica/africa-galkayo-district-conflict-and-security-assessment-2015") train = ds["train"].to_pandas() test = ds["test"].to_pandas() print(train.shape) train.head() ``` --- ## Schema | Column | Type | Null % | Range / Sample Values | |---|---|---|---| | `serial` | int64 | 0.0% | 1.0 – 206.0 (mean 103.8426) | | `region_name` | int64 | 0.0% | 1.0 – 1.0 (mean 1.0) | | `district_name` | int64 | 0.0% | 1.0 – 1.0 (mean 1.0) | | `village_name` | int64 | 0.0% | 1.0 – 5.0 (mean 2.7868) | | `gender_responder` | int64 | 0.0% | 1.0 – 2.0 (mean 1.4822) | | `age` | int64 | 0.0% | 1.0 – 6.0 (mean 2.934) | | `marital_status` | int64 | 0.0% | 1.0 – 888.0 (mean 10.797) | | `level_education` | int64 | 0.0% | 1.0 – 7.0 (mean 3.6142) | | `police_presense` | int64 | 0.0% | 1.0 – 888.0 (mean 62.3299) | | `number_of_stations` | object | 0.0% | , 1, 2 | | `number_of_stations_other` | object | 0.0% | | | `distance_to_station` | object | 0.0% | , 1, 2 | | `reporting_civil` | int64 | 0.0% | 1.0 – 888.0 (mean 35.6193) | | `reporting_civil_other` | object | 0.0% | , Xoghayaha xaafada, Dadkadhexdiisa | | `reporting_petty_crime` | int64 | 0.0% | 1.0 – 888.0 (mean 43.3046) | | `reporting_petty_other` | object | 0.0% | , Dadka dhexdiida, Grt urur l dhaho | | `reporting_serious_crime` | int64 | 0.0% | 1.0 – 888.0 (mean 31.802) | | `reporting_serious_other` | object | 0.0% | | | `trusted_sec_prov` | int64 | 0.0% | 1.0 – 777.0 (mean 31.0558) | | `trusted_sec_other` | object | 0.0% | , Maxkamada, Xoghayaha xaafada | | `reason_for_choice_sec` | float64 | 5.1% | 1.0 – 777.0 (mean 6.4492) | | `reason_for_choice_sec_other` | object | 0.0% | , Dadaaal badan ayay sameyan, Waa odayaal dhaqameed dee | | `level_trust_police` | int64 | 0.0% | 1.0 – 888.0 (mean 42.6701) | | `police_yearly_trend` | int64 | 0.0% | 1.0 – 888.0 (mean 120.9492) | | `court_presense` | int64 | 0.0% | 1.0 – 888.0 (mean 53.4112) | | `number_of_courts` | object | 0.0% | , 1, 777 | | `number_of_courts_other` | object | 0.0% | | | `where_is_court` | object | 0.0% | | | `distance_to_court` | object | 0.0% | | | `legal_clinic_aware` | int64 | 0.0% | 1.0 – 777.0 (mean 29.467) | | `legal_clinic_use` | object | 0.0% | | | `legal_clinic_ref` | object | 0.0% | | | `legal_clinic_ref_other` | object | 0.0% | | | `legal_clinic_issue` | object | 0.0% | | | `legal_clinic_issue_other` | object | 0.0% | | | `legal_clinic_judgement` | object | 0.0% | | | `legal_clinic_enforced` | object | 0.0% | | | `court_use` | int64 | 0.0% | 1.0 – 888.0 (mean 18.2437) | | `court_ref` | object | 0.0% | | | `court_ref_other` | object | 0.0% | | | `court_issue` | object | 0.0% | | | `court_issue_other` | object | 0.0% | | | `court_judgement` | object | 0.0% | | | `court_enforced` | object | 0.0% | | | `elders_use` | int64 | 0.0% | 1.0 – 777.0 (mean 9.6802) | | `elders_ref` | object | 0.0% | | | `elders_ref_other` | object | 0.0% | | | `elders_issue` | object | 0.0% | | | `elders_issue_other` | object | 0.0% | | | `elders_judgement` | object | 0.0% | | | `elders_enforced` | object | 0.0% | | | `religious_use` | int64 | 0.0% | | | `religious_ref` | object | 0.0% | | | `religious_ref_other` | object | 0.0% | | | `religious_issue` | object | 0.0% | | | `religious_issue_other` | object | 0.0% | | | `religious_judgement` | object | 0.0% | | | `religious_enforced` | object | 0.0% | | | `trusted_just_prov` | int64 | 0.0% | | | `trusted_just_prov_other` | object | 0.0% | | | `reason_for_choice_just` | float64 | 7.6% | | | `reason_for_choice_just_other` | object | 0.0% | | | `conf_formal_just` | int64 | 0.0% | | | `court_yearly_trend` | int64 | 0.0% | | | `local_council_aware` | int64 | 0.0% | | | `aware_of_services` | object | 0.0% | | | `channels_comm` | object | 0.0% | | | `consultation_participation` | object | 0.0% | | | `participation_frequency` | object | 0.0% | | | `participation_frequency_other` | object | 0.0% | | | `elected_opinion` | int64 | 0.0% | | | `loc_gov_serviceseducation` | object | 0.0% | | | `loc_gov_serviceshealth` | object | 0.0% | | | `loc_gov_servicessecurity` | object | 0.0% | | | `loc_gov_servicesjustice` | object | 0.0% | | | `loc_gov_servicesagriculture` | object | 0.0% | | | `loc_gov_servicesinfrastructure` | object | 0.0% | | | `loc_gov_servicessanitation` | object | 0.0% | | | `loc_gov_serviceswater` | object | 0.0% | | | `loc_gov_servicesother` | object | 0.0% | | | `loc_gov_servicesdont_know` | object | 0.0% | | | `loc_gov_servicesrefused_to_answer` | object | 0.0% | | | `loc_gov_services_other` | object | 0.0% | | | `community_issueslack_of_water` | object | 0.0% | | | `community_issuesdrought` | object | 0.0% | | | `community_issueslack_of_infrastructure` | object | 0.0% | | | `community_issuespoor_sanitation` | object | 0.0% | | | `community_issuespoor_health` | object | 0.0% | | | `community_issuesunemployment` | object | 0.0% | | | `community_issuespoor_education` | object | 0.0% | | | `community_issuesshortage_of_electicity_supply` | object | 0.0% | | | `community_issuespoor_economy` | object | 0.0% | | | `community_issuescharcoal_production_deforestation` | object | 0.0% | | | `community_issuesbad_health_centers` | object | 0.0% | | | `community_issuesinsecurity` | object | 0.0% | | | `community_issuesgender_based_violence` | object | 0.0% | | | `community_issuesother` | object | 0.0% | | | `community_issuesdont_know` | object | 0.0% | | | `community_issuesrefused_to_answer` | object | 0.0% | | | `community_issues_other` | object | 0.0% | | | `council_yearly_trend` | object | 0.0% | | | `witnessed_conflict` | int64 | 0.0% | | | `number_of_conflicts` | object | 0.0% | | | `number_conf_violence` | object | 0.0% | | | `number_casualties` | object | 0.0% | | | `conflict_reasonresources` | object | 0.0% | | | `conflict_reasonfamily_disputes` | object | 0.0% | | | `conflict_reasoncrime` | object | 0.0% | | | `conflict_reasonpower` | object | 0.0% | | | `conflict_reasonrevenge` | object | 0.0% | | | `conflict_reasonbusiness_disputes` | object | 0.0% | | | `conflict_reasonrape` | object | 0.0% | | | `conflict_reasonlack_of_justice` | object | 0.0% | | | `conflict_reasonother` | object | 0.0% | | | `conflict_reasondont_know` | object | 0.0% | | | `conflict_reasonrefused_to_answer` | object | 0.0% | | | `conflict_reason_other` | object | 0.0% | | | `witnessed_crimes` | int64 | 0.0% | | | `how_safe` | int64 | 0.0% | | | `safety_yearly_trend` | int64 | 0.0% | | | `esa_source` | object | 0.0% | | | `esa_processed` | object | 0.0% | | --- ## Numeric Summary | Column | Min | Max | Mean | Median | |---|---|---|---|---| | `serial` | 1.0 | 206.0 | 103.8426 | 104.0 | | `region_name` | 1.0 | 1.0 | 1.0 | 1.0 | | `district_name` | 1.0 | 1.0 | 1.0 | 1.0 | | `village_name` | 1.0 | 5.0 | 2.7868 | 3.0 | | `gender_responder` | 1.0 | 2.0 | 1.4822 | 1.0 | | `age` | 1.0 | 6.0 | 2.934 | 3.0 | | `marital_status` | 1.0 | 888.0 | 10.797 | 2.0 | | `level_education` | 1.0 | 7.0 | 3.6142 | 4.0 | | `police_presense` | 1.0 | 888.0 | 62.3299 | 2.0 | | `reporting_civil` | 1.0 | 888.0 | 35.6193 | 4.0 | | `reporting_petty_crime` | 1.0 | 888.0 | 43.3046 | 4.0 | | `reporting_serious_crime` | 1.0 | 888.0 | 31.802 | 5.0 | | `trusted_sec_prov` | 1.0 | 777.0 | 31.0558 | 4.0 | | `reason_for_choice_sec` | 1.0 | 777.0 | 6.4492 | 2.0 | | `level_trust_police` | 1.0 | 888.0 | 42.6701 | 3.0 | --- ## Curation Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 2 column(s) were cast from string to numeric or datetime based on parse-success rate (>85% threshold). The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet. --- ## Limitations - Data originates from Observatory of Conflict and Violence Prevention (inactive) and has not been independently validated by ESA. - Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection. - Refer to the [original HDX dataset page](https://data.humdata.org/dataset/galkayo-district-conflict-and-security-assessment-2015) for the publisher's own methodology notes and caveats. --- ## Citation ```bibtex @dataset{hdx_africa_galkayo_district_conflict_and_security_assessment_2015, title = {Galkayo District Conflict and Security Assessment - 2015}, author = {Observatory of Conflict and Violence Prevention (inactive)}, year = {2023}, url = {https://data.humdata.org/dataset/galkayo-district-conflict-and-security-assessment-2015}, note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)} } ``` --- *[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica
搜集汇总
数据集介绍
main_image_url
构建方式
在冲突与安全评估领域,数据采集的严谨性直接关系到研究结论的可靠性。该数据集由冲突与暴力预防观察站通过实地调查构建,聚焦于索马里加勒卡约地区的社区安全状况。原始数据来源于人道主义数据交换平台,经过Electric Sheep Africa团队的规范化处理,包括统一缺失值标记、转换列名为蛇形命名法,并依据解析成功率将部分字符串列转换为数值类型。数据集被划分为训练集与测试集,采用固定随机种子以确保可复现性,最终以Snappy压缩的Parquet格式存储,为机器学习应用提供了结构化的基础。
特点
本数据集在冲突研究领域展现出独特的价值,其核心特征在于覆盖了多维度的社会安全指标。数据集包含197条观测记录,涵盖122个变量,其中31个为数值型,91个为分类型,细致刻画了地理、人口、安全感知与司法服务等多个层面。地理范围限定于索马里加勒卡约地区,数据单元以次国家级行政区域为观察对象,提供了从警力存在、犯罪报告到社区冲突原因等丰富细节。值得注意的是,数据集包含大量分类变量与特定编码的数值字段,如使用888代表未知响应,这为分析社区安全动态的复杂性提供了实证基础。
使用方法
在机器学习驱动的社会安全分析中,该数据集为分类与回归任务提供了直接的应用场景。研究者可通过Hugging Face的datasets库便捷加载数据,利用Python环境将数据转换为Pandas DataFrame以进行探索性分析。数据已预分为157条训练样本与39条测试样本,支持监督学习模型的训练与评估。典型应用包括基于人口统计学与地理特征预测安全感知趋势,或分析冲突报告与司法服务使用之间的关联。使用时应参考原始发布方的方法论说明,并注意数据中存在的编码约定与潜在偏差,以确保分析结论的稳健性。
背景与挑战
背景概述
在冲突与安全研究领域,对特定区域进行精细化评估是理解社区动态、制定有效干预策略的基石。Galkayo District Conflict and Security Assessment - 2015数据集由现已停止运营的冲突与暴力预防观察站于2015年创建,旨在系统收集索马里中部穆杜格地区首府加尔卡约区的原始数据。该数据集聚焦于社区安全与稳定的核心研究问题,通过涵盖地理、人口、司法服务、冲突事件等多维度变量,为公共健康与人道主义行动提供了实证基础。其发布不仅丰富了非洲地区冲突研究的微观数据资源,也为机器学习模型在复杂社会议题中的应用开辟了新的探索路径。
当前挑战
该数据集致力于解决冲突地区安全态势评估的领域挑战,其核心在于从高度异质且敏感的社会环境中提取可靠信息,以支持冲突预测、资源分配等决策任务。构建过程中面临多重困难:原始数据采集于动荡的索马里地区,受限于安全条件与受访者信任度,可能存在抽样偏差与报告不一致性;数据集包含122个变量,其中91个为分类特征,且存在大量缺失值与特殊编码(如888、777),对数据清洗与特征工程提出严峻考验;此外,数据规模较小(总计197行),在应用于机器学习模型时易受过拟合与泛化能力不足的制约。
常用场景
经典使用场景
在冲突与安全研究领域,该数据集常被用于构建社区安全态势的预测模型。研究者通过分析加勒卡约地区居民对警务存在、犯罪报告趋势及司法机构信任度等变量的响应,能够训练分类或回归算法,以识别影响社区安全感知的关键因素。这类模型不仅揭示了安全动态的潜在模式,还为后续干预策略的制定提供了数据驱动的决策依据。
实际应用
在实际应用中,该数据集支持人道主义组织与地方治理机构开展针对性的安全干预规划。基于数据中反映的犯罪报告模式、司法服务使用障碍及社区优先关切,决策者能够设计更精准的警务部署、法律援助项目或社区对话倡议,从而提升加勒卡约等冲突影响区域的安全服务水平与居民福祉。
衍生相关工作
围绕该数据集衍生的经典工作包括基于机器学习的社区安全风险评估框架,以及跨区域冲突预测的比较研究。例如,学者利用其构建的安全感知指数已被整合到更广泛的非洲冲突数据集中,用于验证治理指标与暴力发生率之间的相关性,进而催生了多项关于地方冲突早期预警与韧性建设策略的学术成果。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务