GEM-Forest: A Global satellite EMbedding–based map of forests and tree crops for 2020 (GEM-Forest products, training & validation data, and model weights)
收藏DataCite Commons2026-05-05 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.18921585
下载链接
链接失效反馈官方服务:
资源简介:
GEM-Forest v1.0 is a 10-meter resolution global dataset utilizes satellite embeddings from the Google DeepMind Alpha Earth Foundation (AEF) model and a Linear SVM classifier to provide a highly accurate map of forests and agricultural tree crops for the year 2020.
Unlike standard land cover products, GEM-Forest specifically identifies agricultural tree crops (specifically oil palm, rubber, coconut, other palms and European tree crops, such as olive orchards and fruit trees) to minimize the confusion between forests and agricultural tree crops.
The approach demonstrates strong potential for temporal transferability across the 2017–2025 period covered by AEF embeddings (as of 1st May 2026). This capability allows multi-year applications and change detection based on models trained for a single year. GEM-Forest can support national and international policy and regulatory decisions, including the EU Deforestation Regulation (EUDR).
Data specifications
Spatial resolution: 10 m (0.00008983°).
Coordinate system: WGS 84 (EPSG:4326).
Temporal coverage: 2020.
Format: Cloud Optimized GeoTIFF (COG), LZW compressed.
Tiling: 5° x 5° tiles with a fixed array of 55,660 x 55,660 pixels.
Hierarchical bundling: Individual 5° x 5° tiles are grouped into larger ZIP archives representing geographic tiles of 30°x30°. They are named by their Northwest (NW) and Southeast (SE) corners to define the geographic extent (e.g., GEM_Forest_N060W090_to_N030E000.zip). Individual tiles are named by their NW corner coordinate (e.g., GEM_Forest_N45E010.tif).
Methodology and validation
The GEM-Forest's methodological baseline and its validation is documented in
Paluba, D., Marsocci, V., Onačillová, K., Puerta Quintana, Y. T., and Hastie, A. (2026): GEM-Forest: A Global satellite EMbedding–based map of forests and tree crops for 2020, EGUsphere [preprint], https://doi.org/10.5194/egusphere-2026-1401.
Classification legend
Value
Description
0
Non-forest / other land cover / no data
1
Natural Forest
2
Tree crops (trained on oil palm, coconut, rubber and European tree crops)
Technical note on alignment
All tiles are aligned to a global grid of 55,660 x 55,660 pixels per 5-degree tile. Users may observe microscopic gaps (<3 cm) at tile boundaries at extreme zoom levels; these are binary floating-point artifacts and within the error bounds of the WGS84 projection, and do not affect pixel-level spatial analysis or area calculations.
Dataset limitations & usage notes
For details see the section 3.8 Limitations in Paluba et al. (2026).
Tree crop composition: The tree crop class primarily represents oil palm, rubber, coconut, and European fruit/olive orchards.
Exclusions: Notably, cocoa and coffee plantations were excluded from the training process due to the lack of high-quality open-access global training data and the technical difficulty of detecting "shaded" cultivation systems via Earth Observation data.
Regional performance variations: * Tree crop representation may be under- or over-represented depending on the region.
The highest uncertainty occurs in small-scale, fragmented landscapes (particularly in Europe), where the model may struggle to distinguish between non-forest vegetation and complex tree crop systems.
Validation data: While accuracy is high across the main global forest/non-forest dataset and eight tree crop validation datasets, the authors acknowledge that additional regional validation data would further refine the global assessment of tree crop distribution.
Classification of forest regrowth: Unlike the FAO and EUDR definitions, which consider "clear-cut areas where regrowth is expected" as forests, the GEM-Forest dataset classifies recently clear-cut areas or early-stage regrowth, which does not meet the physical criteria (5 m height), as non-forest.
Urban land cover artifacts: Users may encounter commission errors (false positives) in the Forest class within certain urban environments. This could occur in specific regions where the GUB GAIA and FADSL datasets do not fully capture the complete extent of urban land cover.
Supporting data
This repository also contains:
training data with AEF labels (training_with_AEF.csv)
the main validation dataset with AEF labels (main_validation_with_AEF.csv)
the tree crop validation dataset with AEF labels (plant_validation_with_AEF.csv)
The saved_models_and_linear_weights.zip file includes the:
trained ML models in .pkl files for Ridge regression, Logistic regression, linear SVM, kNN, Random forest and for neural networks (MLP10 and MLP100)
trained ML model file in a .json file for XGBoost
weights and intercepts of the linear ML models (RR, LR and linear SVM) in the Linear_weights.csv file
Citation
Paluba, D., Marsocci, V., Onačillová, K., Puerta Quintana, Y. T., and Hastie, A. (2026): GEM-Forest: A Global satellite EMbedding–based map of forests and tree crops for 2020, EGUsphere [preprint], https://doi.org/10.5194/egusphere-2026-1401.
Paluba, D., Marsocci, V., Onačillová, K., Puerta Quintana, Y. T., & Hastie, A. (2026). GEM-Forest: A Global satellite EMbedding–based map of forests and tree crops for 2020 (GEM-Forest products, training & validation data, and model weights) (1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.18921586
Other sources
Google Earth Engine Asset (open a code snippet in the EE Code Editor)
GEE GEM-Forest Explorer App
Dataset as an ArcGIS REST API [web service, that can be loaded in QGIS, ArcGIS Pro etc.]:
https://tiles.arcgis.com/tiles/LPm07959azIAvFRD/arcgis/rest/services/GEM_Forest_v1_0/MapServer
Technical Metadata
This dataset is accompanied by a ISO 19115-1:2014 compliant metadata record in XML format (Metadata_ISO19139_GEM-Forest_2020.xml). This file provides the global spatial extent, lineage, and technical specifications of the methodology used.
提供机构:
Zenodo
创建时间:
2026-05-01



