electricsheepafrica/africa-ucdp-data-for-guinea-bissau
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-ucdp-data-for-guinea-bissau
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- n<1K
source_datasets:
- original
task_categories:
- tabular-classification
- other
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- conflict-violence
- hxl
- gnb
pretty_name: "Guinea-Bissau - Data on Conflict Events"
dataset_info:
splits:
- name: train
num_examples: 17
- name: test
num_examples: 4
---
# Guinea-Bissau - Data on Conflict Events
**Publisher:** HDX · **Source:** [HDX](https://data.humdata.org/dataset/ucdp-data-for-guinea-bissau) · **License:** `cc-by-igo` · **Updated:** 2026-04-03
---
## Abstract
This dataset is UCDP's most disaggregated dataset, covering individual events of organized violence (phenomena of lethal violence occurring at a given time and place). These events are sufficiently fine-grained to be geo-coded down to the level of individual villages, with temporal durations disaggregated to single, individual days.
Sundberg, Ralph, and Erik Melander, 2013, “Introducing the UCDP Georeferenced Event Dataset”, Journal of Peace Research, vol.50, no.4, 523-532
Högbladh Stina, 2019, “UCDP GED Codebook version 19.1”, Department of Peace and Conflict Research, Uppsala University
Each row in this dataset represents first-level administrative unit observations. Temporal coverage is indicated by the `date_start`, `date_end` column(s). Geographic scope: **GNB**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Conflict and security |
| **Unit of observation** | First-level administrative unit observations |
| **Rows (total)** | 22 |
| **Columns** | 48 (27 numeric, 18 categorical, 2 datetime) |
| **Train split** | 17 rows |
| **Test split** | 4 rows |
| **Geographic scope** | GNB |
| **Publisher** | HDX |
| **HDX last updated** | 2026-04-03 |
---
## Variables
**Geographic** — `year` (range 1992.0–2000.0), `active_year`, `type_of_violence` (range 1.0–1.0), `dyad_dset_id` (range 806.0–866.0), `dyad_new_id` (range 806.0–866.0) and 9 others.
**Temporal** — `date_prec` (range 1.0–5.0), `date_start`, `date_end`.
**Outcome / Measurement** — `number_of_sources` (range -1.0–-1.0), `deaths_a` (range 0.0–16.0), `deaths_b`, `deaths_civilians`, `deaths_unknown`.
**Identifier / Metadata** — `id` (range 24829.0–39324.0), `relid` (SEN-1992-1-129-3, SEN-2000-1-129-8.1, GNB-1999-1-58-2), `code_status` (Clear), `conflict_dset_id` (range 375.0–410.0), `conflict_new_id` (range 375.0–410.0) and 12 others.
**Other** — `where_prec` (range 1.0–6.0), `where_description` (Bissau city, Guinea-Bissau, Sao Domingos region), `adm_1`, `adm_2`, `geom_wkt` and 4 others.
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-ucdp-data-for-guinea-bissau")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `id` | int64 | 0.0% | 24829.0 – 39324.0 (mean 26805.9091) |
| `relid` | object | 0.0% | SEN-1992-1-129-3, SEN-2000-1-129-8.1, GNB-1999-1-58-2 |
| `year` | int64 | 0.0% | 1992.0 – 2000.0 (mean 1998.0455) |
| `active_year` | bool | 0.0% | |
| `code_status` | object | 0.0% | Clear |
| `type_of_violence` | int64 | 0.0% | 1.0 – 1.0 (mean 1.0) |
| `conflict_dset_id` | int64 | 0.0% | 375.0 – 410.0 (mean 405.2273) |
| `conflict_new_id` | int64 | 0.0% | 375.0 – 410.0 (mean 405.2273) |
| `conflict_name` | object | 0.0% | Guinea-Bissau: Government, Senegal: Casamance |
| `dyad_dset_id` | int64 | 0.0% | 806.0 – 866.0 (mean 857.8182) |
| `dyad_new_id` | int64 | 0.0% | 806.0 – 866.0 (mean 857.8182) |
| `dyad_name` | object | 0.0% | Government of Guinea-Bissau - Military Junta for the Consolidation of Democracy, Peace and Justice, Government of Senegal - MFDC |
| `side_a_dset_id` | int64 | 0.0% | 70.0 – 73.0 (mean 70.4091) |
| `side_a_new_id` | int64 | 0.0% | 70.0 – 73.0 (mean 70.4091) |
| `side_a` | object | 0.0% | Government of Guinea-Bissau, Government of Senegal |
| `side_b_dset_id` | int64 | 0.0% | 529.0 – 549.0 (mean 546.2727) |
| `side_b_new_id` | int64 | 0.0% | 529.0 – 549.0 (mean 546.2727) |
| `side_b` | object | 0.0% | Military Junta for the Consolidation of Democracy, Peace and Justice, MFDC |
| `number_of_sources` | int64 | 0.0% | -1.0 – -1.0 (mean -1.0) |
| `source_article` | object | 0.0% | BBC Monitoring Service: Africa 15/4-00; RTP Internacional TV, Lisbon, in Portuguese 1800 gmt 12 Apr 00, BBC Monitoring Service: Africa 17/4-00: Radio France Internationale, Paris, in French 0730 gmt 14 Apr 00, Africa research bulletin (ARB) vol 35 no 11, www.cidcm.umd.edu/inscr/mar/data/sencasamchro.htm |
| `source_original` | object | 22.7% | Senegalese army; Military Junta for the Consolidation of Democracy, Peace and Justice, local rebel leader, Senegalese army |
| `where_prec` | int64 | 0.0% | 1.0 – 6.0 (mean 1.7273) |
| `where_coordinates` | object | 0.0% | Bissau city, Guinea-Bissau, Sao Domingos sector |
| `where_description` | object | 0.0% | Bissau city, Guinea-Bissau, Sao Domingos region |
| `adm_1` | object | 9.1% | |
| `adm_2` | object | 13.6% | |
| `latitude` | float64 | 0.0% | 11.85 – 12.6561 (mean 12.0221) |
| `longitude` | float64 | 0.0% | -16.2006 – -14.2 (mean -15.4152) |
| `geom_wkt` | object | 0.0% | |
| `priogrid_gid` | int64 | 0.0% | 146489.0 – 147930.0 (mean 146816.7727) |
| `country` | object | 0.0% | |
| `iso3` | object | 0.0% | |
| `country_id` | int64 | 0.0% | 404.0 – 404.0 (mean 404.0) |
| `region` | object | 0.0% | |
| `event_clarity` | int64 | 0.0% | 1.0 – 2.0 (mean 1.3182) |
| `date_prec` | int64 | 0.0% | 1.0 – 5.0 (mean 2.1818) |
| `date_start` | datetime64[ns] | 0.0% | |
| `date_end` | datetime64[ns] | 0.0% | |
| `deaths_a` | int64 | 0.0% | 0.0 – 16.0 (mean 2.3182) |
| `deaths_b` | int64 | 0.0% | |
| `deaths_civilians` | int64 | 0.0% | |
| `deaths_unknown` | int64 | 0.0% | |
| `best` | int64 | 0.0% | |
| `high` | int64 | 0.0% | |
| `low` | int64 | 0.0% | |
| `gwnoa` | int64 | 0.0% | |
| `esa_source` | object | 0.0% | |
| `esa_processed` | object | 0.0% | |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `id` | 24829.0 | 39324.0 | 26805.9091 | 24847.5 |
| `year` | 1992.0 | 2000.0 | 1998.0455 | 1998.0 |
| `type_of_violence` | 1.0 | 1.0 | 1.0 | 1.0 |
| `conflict_dset_id` | 375.0 | 410.0 | 405.2273 | 410.0 |
| `conflict_new_id` | 375.0 | 410.0 | 405.2273 | 410.0 |
| `dyad_dset_id` | 806.0 | 866.0 | 857.8182 | 866.0 |
| `dyad_new_id` | 806.0 | 866.0 | 857.8182 | 866.0 |
| `side_a_dset_id` | 70.0 | 73.0 | 70.4091 | 70.0 |
| `side_a_new_id` | 70.0 | 73.0 | 70.4091 | 70.0 |
| `side_b_dset_id` | 529.0 | 549.0 | 546.2727 | 549.0 |
| `side_b_new_id` | 529.0 | 549.0 | 546.2727 | 549.0 |
| `number_of_sources` | -1.0 | -1.0 | -1.0 | -1.0 |
| `where_prec` | 1.0 | 6.0 | 1.7273 | 1.0 |
| `latitude` | 11.85 | 12.6561 | 12.0221 | 11.8583 |
| `longitude` | -16.2006 | -14.2 | -15.4152 | -15.5833 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 4 column(s) with >80% missing values were removed: `source_office`, `source_date`, `source_headline`, `gwnob`. 2 column(s) were cast from string to numeric or datetime based on parse-success rate (>85% threshold). The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from HDX and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- The following columns have >20% missing values and should be treated with caution in modelling: `source_original`.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/ucdp-data-for-guinea-bissau) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_ucdp_data_for_guinea_bissau,
title = {Guinea-Bissau - Data on Conflict Events},
author = {HDX},
year = {2026},
url = {https://data.humdata.org/dataset/ucdp-data-for-guinea-bissau},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
标注创建者:无标注
语言创建者:采集所得
语言:英语
许可证:CC BY 4.0
多语言性:单语言
规模类别:样本数少于1000
源数据集:原创数据集
任务类别:表格分类、其他
任务ID:无
标签:非洲、人道主义、HDX、electric-sheep-africa、冲突暴力、HXL、GNB
美观名称:"几内亚比绍——冲突事件数据"
数据集信息:
划分:
- 名称:训练集(train),样本数:17
- 名称:测试集(test),样本数:4
# 几内亚比绍——冲突事件数据
**发布方:** HDX · **来源:** [HDX](https://data.humdata.org/dataset/ucdp-data-for-guinea-bissau) · **许可证:** `cc-by-igo` · **更新时间:** 2026-04-03
---
## 摘要
本数据集为乌普萨拉冲突数据项目(Uppsala Conflict Data Program, UCDP)粒度最精细的数据集,涵盖有组织暴力的单个事件(即特定时空下发生的致命暴力事件)。这些事件的地理编码精度可细化至单个村庄级别,时间粒度可拆分至单日。
> Sundberg, Ralph, and Erik Melander, 2013, "Introducing the UCDP Georeferenced Event Dataset", *Journal of Peace Research*, vol.50, no.4, 523-532
> Högbladh Stina, 2019, "UCDP GED Codebook version 19.1", Department of Peace and Conflict Research, Uppsala University
本数据集的每一行代表一级行政单元的观测数据。时间覆盖范围由`date_start`、`date_end`列标注。地理范围:**GNB(几内亚比绍ISO 3166-1 alpha-3代码)**。
> 本数据集已由[Electric Sheep Africa](https://huggingface.co/electricsheepafrica)整理为可供机器学习直接使用的Parquet格式。
---
## 数据集特征
| 项 | 详情 |
|---|---|
| **领域** | 冲突与安全 |
| **观测单元** | 一级行政单元观测数据 |
| **总行数** | 22 |
| **列数** | 48(27个数值型列、18个分类型列、2个日期时间型列) |
| **训练集样本数** | 17 |
| **测试集样本数** | 4 |
| **地理覆盖范围** | GNB |
| **发布方** | HDX |
| **HDX最后更新时间** | 2026-04-03 |
---
## 变量
### 地理相关变量
`year`(取值范围1992.0–2000.0)、`active_year`、`type_of_violence`(取值范围1.0–1.0)、`dyad_dset_id`(取值范围806.0–866.0)、`dyad_new_id`(取值范围806.0–866.0)等共9个变量。
### 时间相关变量
`date_prec`(取值范围1.0–5.0)、`date_start`、`date_end`。
### 结果/测量变量
`number_of_sources`(取值范围-1.0–-1.0)、`deaths_a`(取值范围0.0–16.0)、`deaths_b`、`deaths_civilians`、`deaths_unknown`。
### 标识符/元数据变量
`id`(取值范围24829.0–39324.0)、`relid`(示例值:SEN-1992-1-129-3、SEN-2000-1-129-8.1、GNB-1999-1-58-2)、`code_status`(取值:"Clear")、`conflict_dset_id`(取值范围375.0–410.0)、`conflict_new_id`(取值范围375.0–410.0)等共12个变量。
### 其他变量
`where_prec`(取值范围1.0–6.0)、`where_description`(示例值:比绍市、几内亚比绍、圣多明各地区)、`adm_1`、`adm_2`、`geom_wkt`等共4个变量。
---
## 快速开始
python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-ucdp-data-for-guinea-bissau")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
---
## 数据模式
| 列名 | 数据类型 | 空值占比 | 取值范围/示例值 |
|---|---|---|---|
| `id` | int64 | 0.0% | 24829.0 – 39324.0(均值 26805.9091) |
| `relid` | object | 0.0% | SEN-1992-1-129-3, SEN-2000-1-129-8.1, GNB-1999-1-58-2 |
| `year` | int64 | 0.0% | 1992.0 – 2000.0(均值 1998.0455) |
| `active_year` | bool | 0.0% | |
| `code_status` | object | 0.0% | "Clear" |
| `type_of_violence` | int64 | 0.0% | 1.0 – 1.0(均值 1.0) |
| `conflict_dset_id` | int64 | 0.0% | 375.0 – 410.0(均值 405.2273) |
| `conflict_new_id` | int64 | 0.0% | 375.0 – 410.0(均值 405.2273) |
| `conflict_name` | object | 0.0% | 几内亚比绍:政府、塞内加尔:卡萨芒斯 |
| `dyad_dset_id` | int64 | 0.0% | 806.0 – 866.0(均值 857.8182) |
| `dyad_new_id` | int64 | 0.0% | 806.0 – 866.0(均值 857.8182) |
| `dyad_name` | object | 0.0% | 几内亚比绍政府——巩固民主、和平与正义军事委员会、塞内加尔政府——卡萨芒斯民主力量运动(MFDC) |
| `side_a_dset_id` | int64 | 0.0% | 70.0 – 73.0(均值 70.4091) |
| `side_a_new_id` | int64 | 0.0% | 70.0 – 73.0(均值 70.4091) |
| `side_a` | object | 0.0% | 几内亚比绍政府、塞内加尔政府 |
| `side_b_dset_id` | int64 | 0.0% | 529.0 – 549.0(均值 546.2727) |
| `side_b_new_id` | int64 | 0.0% | 529.0 – 549.0(均值 546.2727) |
| `side_b` | object | 0.0% | 巩固民主、和平与正义军事委员会、卡萨芒斯民主力量运动(MFDC) |
| `number_of_sources` | int64 | 0.0% | -1.0 – -1.0(均值 -1.0) |
| `source_article` | object | 0.0% | BBC Monitoring Service: Africa 15/4-00; RTP Internacional TV, Lisbon, in Portuguese 1800 gmt 12 Apr 00, BBC Monitoring Service: Africa 17/4-00: Radio France Internationale, Paris, in French 0730 gmt 14 Apr 00, Africa research bulletin (ARB) vol 35 no 11, www.cidcm.umd.edu/inscr/mar/data/sencasamchro.htm |
| `source_original` | object | 22.7% | 塞内加尔军队、巩固民主、和平与正义军事委员会、当地叛乱领导人、塞内加尔军队 |
| `where_prec` | int64 | 0.0% | 1.0 – 6.0(均值 1.7273) |
| `where_coordinates` | object | 0.0% | 比绍市、几内亚比绍、圣多明各区域 |
| `where_description` | object | 0.0% | 比绍市、几内亚比绍、圣多明各地区 |
| `adm_1` | object | 9.1% | |
| `adm_2` | object | 13.6% | |
| `latitude` | float64 | 0.0% | 11.85 – 12.6561(均值 12.0221) |
| `longitude` | float64 | 0.0% | -16.2006 – -14.2(均值 -15.4152) |
| `geom_wkt` | object | 0.0% | |
| `priogrid_gid` | int64 | 0.0% | 146489.0 – 147930.0(均值 146816.7727) |
| `country` | object | 0.0% | |
| `iso3` | object | 0.0% | |
| `country_id` | int64 | 0.0% | 404.0 – 404.0(均值 404.0) |
| `region` | object | 0.0% | |
| `event_clarity` | int64 | 0.0% | 1.0 – 2.0(均值 1.3182) |
| `date_prec` | int64 | 0.0% | 1.0 – 5.0(均值 2.1818) |
| `date_start` | datetime64[ns] | 0.0% | |
| `date_end` | datetime64[ns] | 0.0% | |
| `deaths_a` | int64 | 0.0% | 0.0 – 16.0(均值 2.3182) |
| `deaths_b` | int64 | 0.0% | |
| `deaths_civilians` | int64 | 0.0% | |
| `deaths_unknown` | int64 | 0.0% | |
| `best` | int64 | 0.0% | |
| `high` | int64 | 0.0% | |
| `low` | int64 | 0.0% | |
| `gwnoa` | int64 | 0.0% | |
| `esa_source` | object | 0.0% | |
| `esa_processed` | object | 0.0% | |
---
## 数值统计摘要
| 列名 | 最小值 | 最大值 | 均值 | 中位数 |
|---|---|---|---|---|
| `id` | 24829.0 | 39324.0 | 26805.9091 | 24847.5 |
| `year` | 1992.0 | 2000.0 | 1998.0455 | 1998.0 |
| `type_of_violence` | 1.0 | 1.0 | 1.0 | 1.0 |
| `conflict_dset_id` | 375.0 | 410.0 | 405.2273 | 410.0 |
| `conflict_new_id` | 375.0 | 410.0 | 405.2273 | 410.0 |
| `dyad_dset_id` | 806.0 | 866.0 | 857.8182 | 866.0 |
| `dyad_new_id` | 806.0 | 866.0 | 857.8182 | 866.0 |
| `side_a_dset_id` | 70.0 | 73.0 | 70.4091 | 70.0 |
| `side_a_new_id` | 70.0 | 73.0 | 70.4091 | 70.0 |
| `side_b_dset_id` | 529.0 | 549.0 | 546.2727 | 549.0 |
| `side_b_new_id` | 529.0 | 549.0 | 546.2727 | 549.0 |
| `number_of_sources` | -1.0 | -1.0 | -1.0 | -1.0 |
| `where_prec` | 1.0 | 6.0 | 1.7273 | 1.0 |
| `latitude` | 11.85 | 12.6561 | 12.0221 | 11.8583 |
| `longitude` | -16.2006 | -14.2 | -15.4152 | -15.5833 |
---
## 数据整理流程
原始数据通过CKAN API从HDX下载,并转换为Parquet格式。列名被统一转换为小写并标准化为蛇形命名法(snake_case)。常见的缺失值标记(`N/A`、`null`、`none`、`-`、`unknown`、`no data`、`#N/A`)被统一替换为`NaN`。移除了4个缺失值占比超过80%的列:`source_office`、`source_date`、`source_headline`、`gwnob`。基于解析成功率(阈值85%),将2列从字符串类型转换为数值型或日期时间型。使用固定随机种子(42)将数据集以80/20的比例划分为训练集与测试集,并保存为Snappy压缩的Parquet格式。
---
## 局限性说明
1. 数据源自HDX,未由Electric Sheep Africa(ESA)进行独立验证。
2. 自动化清洗无法修正原始数据收集中的错报值、定义不一致或采样偏差问题。
3. 以下列的缺失值占比超过20%,在建模时需谨慎使用:`source_original`。
4. 如需查看发布方的方法说明与免责声明,请参阅[原始HDX数据集页面](https://data.humdata.org/dataset/ucdp-data-for-guinea-bissau)。
---
## 引用格式
bibtex
@dataset{hdx_africa_ucdp_data_for_guinea_bissau,
title = {Guinea-Bissau - Data on Conflict Events},
author = {HDX},
year = {2026},
url = {https://data.humdata.org/dataset/ucdp-data-for-guinea-bissau},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — 非洲的机器学习数据集基础设施。尼日利亚拉各斯。*
提供机构:
electricsheepafrica



