electricsheepafrica/africa-unep-wdpca-moz
收藏Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-unep-wdpca-moz
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- n<1K
source_datasets:
- original
task_categories:
- tabular-regression
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- environment
- geodata
- moz
pretty_name: "Protected and Conserved Areas (WDPCA) in Mozambique"
dataset_info:
splits:
- name: train
num_examples: 51
- name: test
num_examples: 12
---
# Protected and Conserved Areas (WDPCA) in Mozambique
**Publisher:** The UN Environment Programme World Conservation Monitoring Centre (UNEP-WCMC) · **Source:** [HDX](https://data.humdata.org/dataset/unep_wdpca_moz) · **License:** `cc-by-igo` · **Updated:** 2026-03-03
---
## Abstract
The World Database on Protected and Conserved Areas (WDPCA) combines the formerly separate World Database on Protected Areas (WDPA) and World Database on Other Effective Area-based Conservation Measures (WD-OECM). The WDPCA is the most comprehensive global database of marine and terrestrial protected areas and other effective area-based conservation measures, updated on a monthly basis, and is one of the key global biodiversity datasets being widely used by scientists, businesses, governments, international secretariats, and others to inform planning, policy decisions, and management.
The WDPCA is part of the Protected Planet Initiative, a joint product of the UN Environment Programme and the International Union for Conservation of Nature (IUCN). The compilation and management of the WDPCA is carried out by the UN Environment Programme World Conservation Monitoring Centre (UNEP-WCMC), in collaboration with governments and other stakeholders. Data and information on the world's protected and conserved areas compiled in the WDPCA is used for reporting on progress towards reaching Target 3 of the Kunming-Montreal Global Biodiversity Framework, which calls for 30% of the world’s land and waters to be effectively conserved by 2030.
Additionally, the WDPCA is used for reporting to the UN to track progress towards the 2030 Sustainable Development Goals, tracking of core indicators of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES), and providing information for other international assessments and reports including the Global Biodiversity Outlook. UNEP-WCMC and IUCN periodically release the Protected Planet Report on the status of the world's protected and conserved areas.
Many platforms are incorporating the WDPCA to provide integrated information to diverse users, including businesses and governments, in a range of sectors. For example, the WDPCA is included in the Integrated Biodiversity Assessment Tool (IBAT), an innovative decision support tool that gives commercial users easy access to up-to-date information that allows them to identify biodiversity risks and opportunities within a project boundary.
The reach of the WDPCA is further enhanced by the UN Biodiversity Lab as well as services developed by other parties, such as the Global Forest Watch and the Digital Observatory for Protected Areas, which provide decision makers with access to monitoring and alert systems that allow whole landscapes to be managed better. Together, these applications of the WDPCA demonstrate the growing value and significance of the Protected Planet initiative.
Each row in this dataset represents individual-level records. Data was last updated on HDX on 2026-03-03. Geographic scope: **MOZ**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Water, sanitation and hygiene (wash) |
| **Unit of observation** | Individual-level records |
| **Rows (total)** | 64 |
| **Columns** | 39 (11 numeric, 28 categorical, 0 datetime) |
| **Train split** | 51 rows |
| **Test split** | 12 rows |
| **Geographic scope** | MOZ |
| **Publisher** | The UN Environment Programme World Conservation Monitoring Centre (UNEP-WCMC) |
| **HDX last updated** | 2026-03-03 |
---
## Variables
**Geographic** — `site_type` (PA), `desig_type` (National, International), `status_yr` (range 0.0–2019.0), `gov_type`, `own_type` and 4 others.
**Identifier / Metadata** — `objectid` (range 527.0–243151.0), `site_id` (range 799.0–555705347.0), `site_pid` (799, 800, 342681), `name_eng` (Chimanimani, Niassa, Gilé), `name` (Chimanimani, Niassa, Gilé) and 3 others.
**Other** — `desig` (Coutada, Reserva Florestal, Parque Nacional), `desig_eng` (Hunting Reserve, Forest Reserve, National Park), `iucn_cat` (VI, IV, II), `int_crit` (Not Applicable, (i);(ii);(iii);(vi);(viii), (i);(ii);(iii);(iv);(vii)), `realm` (Terrestrial, Coastal, Marine) and 17 others.
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-unep-wdpca-moz")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `objectid` | int64 | 0.0% | 527.0 – 243151.0 (mean 119035.5781) |
| `site_id` | int64 | 0.0% | 799.0 – 555705347.0 (mean 243197314.375) |
| `site_pid` | object | 0.0% | 799, 800, 342681 |
| `site_type` | object | 0.0% | PA |
| `name_eng` | object | 0.0% | Chimanimani, Niassa, Gilé |
| `name` | object | 0.0% | Chimanimani, Niassa, Gilé |
| `desig` | object | 0.0% | Coutada, Reserva Florestal, Parque Nacional |
| `desig_eng` | object | 0.0% | Hunting Reserve, Forest Reserve, National Park |
| `desig_type` | object | 0.0% | National, International |
| `iucn_cat` | object | 0.0% | VI, IV, II |
| `int_crit` | object | 0.0% | Not Applicable, (i);(ii);(iii);(vi);(viii), (i);(ii);(iii);(iv);(vii) |
| `realm` | object | 0.0% | Terrestrial, Coastal, Marine |
| `rep_m_area` | float64 | 0.0% | 0.0 – 1430.0 (mean 32.9375) |
| `gis_m_area` | float64 | 0.0% | 0.0 – 5790.1945 (mean 201.0486) |
| `rep_area` | float64 | 0.0% | 0.0 – 42000.0 (mean 3630.9301) |
| `gis_area` | float64 | 0.0% | 0.642 – 38188.5813 (mean 4120.3976) |
| `no_take` | object | 0.0% | |
| `no_tk_area` | float64 | 0.0% | 0.0 – 0.0 (mean 0.0) |
| `status` | object | 0.0% | |
| `status_yr` | int64 | 0.0% | 0.0 – 2019.0 (mean 1150.8594) |
| `restrict` | object | 0.0% | |
| `gov_type` | object | 0.0% | |
| `verif` | object | 0.0% | |
| `inlnd_wtrs` | object | 0.0% | |
| `own_type` | object | 0.0% | |
| `mang_auth` | object | 0.0% | |
| `mang_plan` | object | 0.0% | |
| `cons_obj` | object | 0.0% | |
| `supp_info` | object | 0.0% | |
| `metadataid` | int64 | 0.0% | 1828.0 – 2123.0 (mean 1832.6094) |
| `prnt_iso3` | object | 0.0% | |
| `iso3` | object | 0.0% | |
| `govsubtype` | object | 0.0% | |
| `ownsubtype` | object | 0.0% | |
| `oecm_asmt` | object | 0.0% | |
| `shape_area` | float64 | 0.0% | 0.0001 – 3.1711 (mean 0.3495) |
| `shape_length` | float64 | 0.0% | 0.0328 – 14.9986 (mean 3.0509) |
| `esa_source` | object | 0.0% | |
| `esa_processed` | object | 0.0% | |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `objectid` | 527.0 | 243151.0 | 119035.5781 | 77538.5 |
| `site_id` | 799.0 | 555705347.0 | 243197314.375 | 342677.5 |
| `rep_m_area` | 0.0 | 1430.0 | 32.9375 | 0.0 |
| `gis_m_area` | 0.0 | 5790.1945 | 201.0486 | 0.0 |
| `rep_area` | 0.0 | 42000.0 | 3630.9301 | 821.0 |
| `gis_area` | 0.642 | 38188.5813 | 4120.3976 | 1553.2703 |
| `no_tk_area` | 0.0 | 0.0 | 0.0 | 0.0 |
| `status_yr` | 0.0 | 2019.0 | 1150.8594 | 1958.5 |
| `metadataid` | 1828.0 | 2123.0 | 1832.6094 | 1828.0 |
| `shape_area` | 0.0001 | 3.1711 | 0.3495 | 0.1325 |
| `shape_length` | 0.0328 | 14.9986 | 3.0509 | 2.3031 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from The UN Environment Programme World Conservation Monitoring Centre (UNEP-WCMC) and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/unep_wdpca_moz) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_unep_wdpca_moz,
title = {Protected and Conserved Areas (WDPCA) in Mozambique},
author = {The UN Environment Programme World Conservation Monitoring Centre (UNEP-WCMC)},
year = {2026},
url = {https://data.humdata.org/dataset/unep_wdpca_moz},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
### 数据集元数据
- 标注创建者:无标注
- 语言创建方式:爬取获取
- 语言:英语
- 许可协议:CC BY 4.0
- 多语言属性:单语言
- 数据规模:样本量小于1000
- 源数据集:原始数据集
- 任务类别:表格回归
- 任务子项:无
- 标签:非洲、人道主义、人类数据交换(HDX)、Electric Sheep Africa、环境、地理数据、莫桑比克
- 友好数据集名称:莫桑比克保护区与保护地(WDPCA)
- 数据集信息:
- 数据拆分:
- 训练集:51条样本
- 测试集:12条样本
---
**发布方**:联合国环境规划署世界保护监测中心(UN Environment Programme World Conservation Monitoring Centre, UNEP-WCMC) · **来源**:[HDX](https://data.humdata.org/dataset/unep_wdpca_moz) · **许可协议**:`cc-by-igo` · **更新时间**:2026-03-03
---
## 摘要
世界保护区与保护地数据库(World Database on Protected and Conserved Areas, WDPCA)整合了此前独立的世界保护区数据库(WDPA)与其他有效基于区域的保护措施数据库(WD-OECM)。WDPCA是全球最全面的海洋与陆地保护区及其他有效基于区域的保护措施数据库,每月更新一次,是全球关键生物多样性数据集之一,被科学家、企业、政府、国际秘书处及其他机构广泛用于规划、政策制定与管理决策。
WDPCA隶属于保护地球倡议(Protected Planet Initiative),是联合国环境规划署与国际自然保护联盟(International Union for Conservation of Nature, IUCN)的联合产物。WDPCA的编纂与管理由联合国环境规划署世界保护监测中心(UNEP-WCMC)联合各国政府及其他利益相关方共同完成。WDPCA中收录的全球保护区与保护地数据及信息,被用于报告《昆明-蒙特利尔全球生物多样性框架》目标3的进展情况,该目标要求到2030年有效保护全球30%的陆地与水域。
此外,WDPCA还被用于向联合国报告2030年可持续发展目标进展情况、追踪生物多样性和生态系统服务政府间科学政策平台(Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services, IPBES)的核心指标,并为包括《全球生物多样性展望》在内的其他国际评估与报告提供信息支持。UNEP-WCMC与IUCN定期发布《保护地球报告》,阐述全球保护区与保护地的现状。
众多平台正在整合WDPCA,为包括不同行业的企业与政府在内的多样化用户提供集成信息。例如,WDPCA被纳入综合生物多样性评估工具(Integrated Biodiversity Assessment Tool, IBAT)——这一创新型决策支持工具可让商业用户便捷获取最新信息,帮助其在项目范围内识别生物多样性风险与机遇。
WDPCA的覆盖范围还通过联合国生物多样性实验室以及其他机构开发的服务得到进一步拓展,例如全球森林观察(Global Forest Watch)与保护区数字观测站,这些工具可为决策者提供监测与预警系统,助力更好地管理整个景观。WDPCA的这些应用共同彰显了保护地球倡议日益增长的价值与重要性。
本数据集的每一行均代表单条个体记录。数据最近一次在HDX更新的时间为2026-03-03。地理范围:**莫桑比克(MOZ)**。
*由[Electric Sheep Africa](https://huggingface.co/electricsheepafrica)整理为适配机器学习的Parquet格式。*
---
## 数据集特征
| | |
|---|---|
| **领域** | 水、环境卫生与个人卫生(WASH) |
| **观测单元** | 个体级记录 |
| **总行数** | 64 |
| **列数** | 39(11个数值型、28个分类型、0个日期时间型) |
| **训练集拆分** | 51条 |
| **测试集拆分** | 12条 |
| **地理范围** | 莫桑比克(MOZ) |
| **发布方** | 联合国环境规划署世界保护监测中心(UNEP-WCMC) |
| **HDX最后更新时间** | 2026-03-03 |
---
## 变量
**地理类** — `site_type`(保护区,PA)、`desig_type`(国家级、国际级)、`status_yr`(取值范围0.0–2019.0)、`gov_type`、`own_type`及其他4个字段。
**标识符/元数据类** — `objectid`(取值范围527.0–243151.0)、`site_id`(取值范围799.0–555705347.0)、`site_pid`(799、800、342681)、`name_eng`(奇马尼马尼、尼亚萨、吉莱)、`name`(奇马尼马尼、尼亚萨、吉莱)及其他3个字段。
**其他类** — `desig`(Coutada、Reserva Florestal、Parque Nacional)、`desig_eng`(狩猎保护区、森林保护区、国家公园)、`iucn_cat`(VI、IV、II)、`int_crit`(不适用、(i);(ii);(iii);(vi);(viii)、(i);(ii);(iii);(iv);(vii))、`realm`(陆地、沿海、海洋)及其他17个字段。
---
## 快速上手
python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-unep-wdpca-moz")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
---
## 数据Schema
| 列名 | 数据类型 | 空值占比 | 取值范围/示例值 |
|---|---|---|---|
| `objectid` | int64 | 0.0% | 527.0 – 243151.0(均值 119035.5781) |
| `site_id` | int64 | 0.0% | 799.0 – 555705347.0(均值 243197314.375) |
| `site_pid` | object | 0.0% | 799, 800, 342681 |
| `site_type` | object | 0.0% | PA |
| `name_eng` | object | 0.0% | 奇马尼马尼、尼亚萨、吉莱 |
| `name` | object | 0.0% | 奇马尼马尼、尼亚萨、吉莱 |
| `desig` | object | 0.0% | Coutada、Reserva Florestal、Parque Nacional |
| `desig_eng` | object | 0.0% | 狩猎保护区、森林保护区、国家公园 |
| `desig_type` | object | 0.0% | 国家级、国际级 |
| `iucn_cat` | object | 0.0% | VI、IV、II |
| `int_crit` | object | 0.0% | 不适用、(i);(ii);(iii);(vi);(viii)、(i);(ii);(iii);(iv);(vii) |
| `realm` | object | 0.0% | 陆地、沿海、海洋 |
| `rep_m_area` | float64 | 0.0% | 0.0 – 1430.0(均值 32.9375) |
| `gis_m_area` | float64 | 0.0% | 0.0 – 5790.1945(均值 201.0486) |
| `rep_area` | float64 | 0.0% | 0.0 – 42000.0(均值 3630.9301) |
| `gis_area` | float64 | 0.0% | 0.642 – 38188.5813(均值 4120.3976) |
| `no_take` | object | 0.0% | 无 |
| `no_tk_area` | float64 | 0.0% | 0.0 – 0.0(均值 0.0) |
| `status` | object | 0.0% | 无 |
| `status_yr` | int64 | 0.0% | 0.0 – 2019.0(均值 1150.8594) |
| `restrict` | object | 0.0% | 无 |
| `gov_type` | object | 0.0% | 无 |
| `verif` | object | 0.0% | 无 |
| `inlnd_wtrs` | object | 0.0% | 无 |
| `own_type` | object | 0.0% | 无 |
| `mang_auth` | object | 0.0% | 无 |
| `mang_plan` | object | 0.0% | 无 |
| `cons_obj` | object | 0.0% | 无 |
| `supp_info` | object | 0.0% | 无 |
| `metadataid` | int64 | 0.0% | 1828.0 – 2123.0(均值 1832.6094) |
| `prnt_iso3` | object | 0.0% | 无 |
| `iso3` | object | 0.0% | 无 |
| `govsubtype` | object | 0.0% | 无 |
| `ownsubtype` | object | 0.0% | 无 |
| `oecm_asmt` | object | 0.0% | 无 |
| `shape_area` | float64 | 0.0% | 0.0001 – 3.1711(均值 0.3495) |
| `shape_length` | float64 | 0.0% | 0.0328 – 14.9986(均值 3.0509) |
| `esa_source` | object | 0.0% | 无 |
| `esa_processed` | object | 0.0% | 无 |
---
## 数值型字段统计摘要
| 列名 | 最小值 | 最大值 | 均值 | 中位数 |
|---|---|---|---|---|
| `objectid` | 527.0 | 243151.0 | 119035.5781 | 77538.5 |
| `site_id` | 799.0 | 555705347.0 | 243197314.375 | 342677.5 |
| `rep_m_area` | 0.0 | 1430.0 | 32.9375 | 0.0 |
| `gis_m_area` | 0.0 | 5790.1945 | 201.0486 | 0.0 |
| `rep_area` | 0.0 | 42000.0 | 3630.9301 | 821.0 |
| `gis_area` | 0.642 | 38188.5813 | 4120.3976 | 1553.2703 |
| `no_tk_area` | 0.0 | 0.0 | 0.0 | 0.0 |
| `status_yr` | 0.0 | 2019.0 | 1150.8594 | 1958.5 |
| `metadataid` | 1828.0 | 2123.0 | 1832.6094 | 1828.0 |
| `shape_area` | 0.0001 | 3.1711 | 0.3495 | 0.1325 |
| `shape_length` | 0.0328 | 14.9986 | 3.0509 | 2.3031 |
---
## 数据整理说明
原始数据通过CKAN API从HDX下载,并转换为Parquet格式。列名统一转换为小写并采用蛇形命名法规范。将常见缺失值标记(`N/A`、`null`、`none`、`-`、`unknown`、`no data`、`#N/A`)统一替换为`NaN`。采用固定随机种子(42)将数据集按80/20比例划分为训练集与测试集,并保存为Snappy压缩的Parquet格式。
---
## 数据集局限性
- 数据源自联合国环境规划署世界保护监测中心(UNEP-WCMC),并未经Electric Sheep Africa独立验证。
- 自动化清洗无法修正原始数据收集中的错报值、定义不一致或抽样偏差问题。
- 请参阅[原始HDX数据集页面](https://data.humdata.org/dataset/unep_wdpca_moz)查看发布方提供的方法说明与注意事项。
---
## 引用格式
bibtex
@dataset{hdx_africa_unep_wdpca_moz,
title = {"莫桑比克保护区与保护地(WDPCA)"},
author = {联合国环境规划署世界保护监测中心(UNEP-WCMC)},
year = {2026},
url = {https://data.humdata.org/dataset/unep_wdpca_moz},
note = {由Electric Sheep Africa(https://huggingface.co/electricsheepafrica)重新打包以适配机器学习场景}
}
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — 非洲机器学习数据集基础设施。尼日利亚拉各斯。*
提供机构:
electricsheepafrica



