five

electricsheepafrica/africa-ucdp-data-for-ghana

收藏
Hugging Face2026-04-11 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-ucdp-data-for-ghana
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - no-annotation language_creators: - found language: - en license: cc-by-4.0 multilinguality: - monolingual size_categories: - n<1K source_datasets: - original task_categories: - tabular-classification - other task_ids: [] tags: - africa - humanitarian - hdx - electric-sheep-africa - conflict-violence - hxl - gha pretty_name: "Ghana - Data on Conflict Events" dataset_info: splits: - name: train num_examples: 44 - name: test num_examples: 11 --- # Ghana - Data on Conflict Events **Publisher:** HDX · **Source:** [HDX](https://data.humdata.org/dataset/ucdp-data-for-ghana) · **License:** `cc-by-igo` · **Updated:** 2026-04-03 --- ## Abstract This dataset is UCDP's most disaggregated dataset, covering individual events of organized violence (phenomena of lethal violence occurring at a given time and place). These events are sufficiently fine-grained to be geo-coded down to the level of individual villages, with temporal durations disaggregated to single, individual days. Sundberg, Ralph, and Erik Melander, 2013, “Introducing the UCDP Georeferenced Event Dataset”, Journal of Peace Research, vol.50, no.4, 523-532 Högbladh Stina, 2019, “UCDP GED Codebook version 19.1”, Department of Peace and Conflict Research, Uppsala University Each row in this dataset represents first-level administrative unit observations. Temporal coverage is indicated by the `date_start`, `date_end` column(s). Geographic scope: **GHA**. *Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).* --- ## Dataset Characteristics | | | |---|---| | **Domain** | Conflict and security | | **Unit of observation** | First-level administrative unit observations | | **Rows (total)** | 55 | | **Columns** | 50 (26 numeric, 21 categorical, 2 datetime) | | **Train split** | 44 rows | | **Test split** | 11 rows | | **Geographic scope** | GHA | | **Publisher** | HDX | | **HDX last updated** | 2026-04-03 | --- ## Variables **Geographic** — `year` (range 1991.0–2024.0), `active_year`, `type_of_violence` (range 2.0–2.0), `dyad_dset_id` (range 5176.0–5360.0), `dyad_new_id` (range 5176.0–5360.0) and 9 others. **Temporal** — `source_date` (2008-01-30, 2023-02-05, 2023-01-10), `date_prec` (range 1.0–5.0), `date_start`, `date_end`. **Outcome / Measurement** — `number_of_sources` (range -1.0–4.0), `deaths_a` (range 0.0–24.0), `deaths_b`, `deaths_civilians`, `deaths_unknown`. **Identifier / Metadata** — `id` (range 11998.0–513853.0), `relid` (GHA-2002-2-3-1, GHA-1994-2-208-2, GHA-2024-2-5268-2), `code_status` (Clear), `conflict_dset_id` (range 5176.0–5360.0), `conflict_new_id` (range 4566.0–4750.0) and 14 others. **Other** — `where_prec` (range 1.0–4.0), `where_description`, `adm_1`, `adm_2`, `geom_wkt` and 3 others. --- ## Quick Start ```python from datasets import load_dataset ds = load_dataset("electricsheepafrica/africa-ucdp-data-for-ghana") train = ds["train"].to_pandas() test = ds["test"].to_pandas() print(train.shape) train.head() ``` --- ## Schema | Column | Type | Null % | Range / Sample Values | |---|---|---|---| | `id` | int64 | 0.0% | 11998.0 – 513853.0 (mean 160574.5273) | | `relid` | object | 0.0% | GHA-2002-2-3-1, GHA-1994-2-208-2, GHA-2024-2-5268-2 | | `year` | int64 | 0.0% | 1991.0 – 2024.0 (mean 2006.6364) | | `active_year` | bool | 0.0% | | | `code_status` | object | 0.0% | Clear | | `type_of_violence` | int64 | 0.0% | 2.0 – 2.0 (mean 2.0) | | `conflict_dset_id` | int64 | 0.0% | 5176.0 – 5360.0 (mean 5303.0545) | | `conflict_new_id` | int64 | 0.0% | 4566.0 – 4750.0 (mean 4693.0545) | | `conflict_name` | object | 0.0% | Kusasi - Mamprusi, Dagomba, Gonja, Nanumba - Konkomba, Gonja - Konkomba, Nawuri | | `dyad_dset_id` | int64 | 0.0% | 5176.0 – 5360.0 (mean 5303.0545) | | `dyad_new_id` | int64 | 0.0% | 5176.0 – 5360.0 (mean 5303.0545) | | `dyad_name` | object | 0.0% | Kusasi - Mamprusi, Dagomba, Gonja, Nanumba - Konkomba, Gonja - Konkomba, Nawuri | | `side_a_dset_id` | int64 | 0.0% | 570.0 – 3802.0 (mean 1946.7273) | | `side_a_new_id` | int64 | 0.0% | 570.0 – 3802.0 (mean 1946.7273) | | `side_a` | object | 0.0% | Kusasi, Dagomba, Gonja, Nanumba, Gonja | | `side_b_dset_id` | int64 | 0.0% | 892.0 – 3803.0 (mean 1255.9273) | | `side_b_new_id` | int64 | 0.0% | 892.0 – 3803.0 (mean 1255.9273) | | `side_b` | object | 0.0% | Mamprusi, Konkomba, Konkomba, Nawuri | | `number_of_sources` | int64 | 0.0% | -1.0 – 4.0 (mean -0.1455) | | `source_article` | object | 0.0% | Wienia Martijn "Ominous calm, Reuters 1995-03-14 "Ethnic clashes in northeast Ghana kill eight"; 1995-03-15 "Curfew clamped on northern Ghana to curb clashes"; 1995-03-24 "Ghana says at least 110 dead in northern conflict"; ARB vol 32 no 3 (1995-04-26) 11792 "Inter-Ethnic Violence, Reuters 1992-05-25 "AT LEAST 63 KILLED IN GHANA ETHNIC CLASHES | | `source_office` | object | 65.5% | Modern Ghana, All Africa, Africa Research Bulletin | | `source_date` | object | 65.5% | 2008-01-30, 2023-02-05, 2023-01-10 | | `source_headline` | object | 65.5% | Don't Politicize Bawku Conflict - Bartels, IN BRIEF: Ghana, Stop the highway attacks – Kusaug Youth Movement | | `source_original` | object | 47.3% | | | `where_prec` | int64 | 0.0% | 1.0 – 4.0 (mean 1.6545) | | `where_coordinates` | object | 0.0% | | | `where_description` | object | 0.0% | | | `adm_1` | object | 0.0% | | | `adm_2` | object | 10.9% | | | `latitude` | float64 | 0.0% | 8.4667 – 11.0796 (mean 10.1714) | | `longitude` | float64 | 0.0% | -1.0 – 0.0667 (mean -0.3164) | | `geom_wkt` | object | 0.0% | | | `priogrid_gid` | int64 | 0.0% | 141480.0 – 145800.0 (mean 144372.9818) | | `country` | object | 0.0% | | | `iso3` | object | 0.0% | | | `country_id` | int64 | 0.0% | 452.0 – 452.0 (mean 452.0) | | `region` | object | 0.0% | | | `event_clarity` | int64 | 0.0% | 1.0 – 2.0 (mean 1.3455) | | `date_prec` | int64 | 0.0% | 1.0 – 5.0 (mean 1.9091) | | `date_start` | datetime64[ns] | 0.0% | | | `date_end` | datetime64[ns] | 0.0% | | | `deaths_a` | int64 | 0.0% | 0.0 – 24.0 (mean 1.2182) | | `deaths_b` | int64 | 0.0% | | | `deaths_civilians` | int64 | 0.0% | | | `deaths_unknown` | int64 | 0.0% | | | `best` | int64 | 0.0% | | | `high` | int64 | 0.0% | | | `low` | int64 | 0.0% | | | `esa_source` | object | 0.0% | | | `esa_processed` | object | 0.0% | | --- ## Numeric Summary | Column | Min | Max | Mean | Median | |---|---|---|---|---| | `id` | 11998.0 | 513853.0 | 160574.5273 | 12619.0 | | `year` | 1991.0 | 2024.0 | 2006.6364 | 2008.0 | | `type_of_violence` | 2.0 | 2.0 | 2.0 | 2.0 | | `conflict_dset_id` | 5176.0 | 5360.0 | 5303.0545 | 5268.0 | | `conflict_new_id` | 4566.0 | 4750.0 | 4693.0545 | 4658.0 | | `dyad_dset_id` | 5176.0 | 5360.0 | 5303.0545 | 5268.0 | | `dyad_new_id` | 5176.0 | 5360.0 | 5303.0545 | 5268.0 | | `side_a_dset_id` | 570.0 | 3802.0 | 1946.7273 | 1057.0 | | `side_a_new_id` | 570.0 | 3802.0 | 1946.7273 | 1057.0 | | `side_b_dset_id` | 892.0 | 3803.0 | 1255.9273 | 1058.0 | | `side_b_new_id` | 892.0 | 3803.0 | 1255.9273 | 1058.0 | | `number_of_sources` | -1.0 | 4.0 | -0.1455 | -1.0 | | `where_prec` | 1.0 | 4.0 | 1.6545 | 1.0 | | `latitude` | 8.4667 | 11.0796 | 10.1714 | 10.9721 | | `longitude` | -1.0 | 0.0667 | -0.3164 | -0.2417 | --- ## Curation Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 2 column(s) with >80% missing values were removed: `gwnoa`, `gwnob`. 2 column(s) were cast from string to numeric or datetime based on parse-success rate (>85% threshold). The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet. --- ## Limitations - Data originates from HDX and has not been independently validated by ESA. - Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection. - The following columns have >20% missing values and should be treated with caution in modelling: `source_office`, `source_date`, `source_headline`, `source_original`. - Refer to the [original HDX dataset page](https://data.humdata.org/dataset/ucdp-data-for-ghana) for the publisher's own methodology notes and caveats. --- ## Citation ```bibtex @dataset{hdx_africa_ucdp_data_for_ghana, title = {Ghana - Data on Conflict Events}, author = {HDX}, year = {2026}, url = {https://data.humdata.org/dataset/ucdp-data-for-ghana}, note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)} } ``` --- *[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica
搜集汇总
数据集介绍
main_image_url
构建方式
在冲突与安全研究领域,精细化的地理参照事件数据对于理解暴力动态至关重要。该数据集基于乌普萨拉冲突数据项目(UCDP)的地理参照事件数据集构建,原始数据通过系统化的事件编码流程收集,涵盖加纳境内发生的组织性暴力事件。数据采集依赖于多元化的公开来源,包括新闻报道、研究报告及官方记录,确保每个事件均具备精确的时间标记与地理坐标,可定位至村庄级别。随后,Electric Sheep Africa团队通过HDX平台获取原始数据,并执行了系统的数据清洗与标准化处理,包括统一缺失值标记、转换数据类型,并按照80/20的比例划分训练集与测试集,最终以Parquet格式封装,为机器学习应用提供即用型数据支持。
使用方法
在机器学习驱动的冲突预测与模式识别研究中,该数据集提供了可直接应用于模型训练的结构化输入。研究人员可通过Hugging Face的datasets库便捷加载数据,利用Python环境将数据转换为Pandas DataFrame进行探索性分析或特征工程。数据集已预分为训练集与测试集,分别包含44条和11条记录,支持监督学习任务,如基于时间、地理及冲突特征的分类或回归建模。在使用时需注意部分来源相关字段存在较高缺失率,建议在建模过程中谨慎处理或进行适当插补。同时,用户应参考原始HDX页面的方法论说明,以充分理解数据收集的局限性与背景假设。
背景与挑战
背景概述
在冲突与安全研究领域,对组织性暴力事件进行精细化、地理编码的数据采集是理解冲突动态、评估人道主义风险及制定干预策略的基石。加纳冲突事件数据集源于乌普萨拉冲突数据项目(UCDP)的地理参考事件数据集,由乌普萨拉大学和平与冲突研究系的Ralph Sundberg、Erik Melander及Stina Högbladh等学者于2013年正式引入学术界,并持续更新至2024年。该数据集以加纳为地理范围,记录了1991年至2024年间发生的55起一级行政区划层级的暴力事件,核心研究问题在于通过高时空分辨率的事件数据,揭示冲突的模式、成因及其演变轨迹,为非洲地区的和平建设与政策分析提供了实证基础。
当前挑战
该数据集旨在解决冲突事件预测与模式识别的领域挑战,其核心在于从稀疏、异构的原始报告中提取可靠的结构化信息,以支持机器学习模型对暴力事件的发生、规模及地理分布进行建模。构建过程中的挑战尤为显著:首先,数据源依赖于媒体报道与官方记录,存在报道偏差、信息缺失及定义不一致等问题,例如`source_office`、`source_date`等关键字段缺失率超过20%,影响了数据的完整性与一致性;其次,事件的地理编码与时间精度处理复杂,`where_prec`与`date_prec`字段的变异性要求对空间与时间不确定性进行细致校准;此外,数据规模较小(仅55行),限制了复杂模型的训练与应用,需通过特征工程或迁移学习弥补样本不足的局限。
常用场景
经典使用场景
在冲突与安全研究领域,该数据集为学者提供了加纳境内组织性暴力事件的精细化记录。其经典应用场景在于支持冲突事件的时空模式分析,研究者可借助地理编码信息与精确时间戳,深入探究暴力事件在特定行政区域内的分布规律与动态演变。这类分析常聚焦于识别冲突热点区域、评估事件聚集性以及追踪长期趋势,为理解地方性暴力机制奠定数据基础。
解决学术问题
该数据集有效应对了冲突研究中数据粒度不足的挑战,通过提供村庄级别的地理编码与单日时间精度,使得微观层面的暴力动力学研究成为可能。它助力学者检验关于族群冲突、资源竞争与暴力扩散的理论假设,并推动定量方法在和平与冲突研究中的应用。其结构化格式亦促进了跨案例比较与因果推断,深化了对冲突驱动因素与缓解策略的学术认知。
实际应用
在实际应用层面,该数据集为政策制定者与国际组织提供了关键的情境感知工具。人道主义机构可依据冲突事件的地理与时间信息,优化资源分配与应急响应规划,例如在暴力频发区域部署援助力量。安全分析人员则能利用历史事件数据评估区域稳定风险,支持早期预警系统的构建,从而提升冲突预防与和平建设工作的针对性与时效性。
数据集最近研究
最新研究方向
在冲突与安全研究领域,地理参照事件数据的精细化分析正成为前沿焦点。基于UCDP地理参照事件数据集构建的加纳冲突事件数据,为探索非洲地区有组织暴力事件的时空演化规律提供了关键支撑。当前研究趋势聚焦于利用机器学习方法,结合地理空间变量与时间序列特征,预测局部冲突的爆发风险与扩散模式。这类工作不仅关联着人道主义行动中的早期预警系统建设,也深刻影响着区域稳定政策的制定。数据集所涵盖的行政单位观测信息,使得学者能够深入剖析族群互动、资源竞争与暴力事件之间的复杂关联,为理解西非安全动态贡献了宝贵的实证基础。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作