High-resolution wall-to-wall time series predictions of seasonal maize area and yield for Rwanda over 2019-2023
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10659094
下载链接
链接失效反馈官方服务:
资源简介:
This is the companion dataset to publication {TBD}. It contains 1) seasonal composites of predicted maize cover and yield at 10 m resolution in Rwanda for two annual agricultural seasons over five years, 2) scripts for the end-to-end machine learning pipeline that produces these data products, and 3) data or references needed as inputs to the pipeline.
1) Maize cover and yield seasonal composites
The data are provided here as netCDF4 files with four dimensions for x, y, band, and season. They can also be accessed as Google Earth ImageCollections at:
https://code.earthengine.google.com/?asset=projects/b2p-geospatial/assets/lulc_classifier_composite
https://code.earthengine.google.com/?asset=projects/b2p-geospatial/assets/maize_yield_composite
Land cover and maize classification
The land cover classification file is found at data/composites/lulc_classifier_Rwanda_2019to2023.nc.
The land cover classification images contain 3 bands/variables: maizeProb, the raw predicted probability of the pixel being maize given by the gradient boosted tree model; majorityClass, the categorical land cover class with the highest predicted probability among any of the nine classes in the respective pixel; and optimalClass, the categorical land cover class adjusted to agree with national statistics for expected maize area.
The land cover classes map to the raster values as follows:
{ 1: 'maize', 2: 'nonmaize_annual', 3: 'nonmaize_perennial', 4: 'scrub_shrub_land', 5: 'forest', 6: 'flooded_vegetation', 7: 'water', 8: 'structure', 9: 'bare'}
The dataset includes 5 years (2019-2023) and 10 seasons - the available time period at time of publication. In Rwanda, maize is typically planted and harvested during two distinct agricultural seasons per year: Season A from September to February and Season B from March to June. Therefore the seasons in the data are: 2019_Season_A, 2019_Season_B, 2020_Season_A, 2020_Season_B, 2021_Season_A, 2021_Season_B, 2022_Season_A, 2022_Season_B, 2023_Season_A, 2023_Season_B.
Maize yield
The maize yield file is found at data/composites/maize_yield_Rwanda_2019to2023.nc.
Each of the images in the yield composites has 3 bands/variables also: maizeYield, the model's output of continuous predicted yield (kg/ha) in each pixel regardless of land class; maizeYield_majorityClass, predicted maize yield masked to the majority class land classification; and maizeYieldAdj_optimalClass, where the raw predicted yields were masked to the optimal maize classification land cover layer and normalized to national statistics.
The dataset includes the same seasons as the classification product; see above for a description.
2) End-to-end machine learning pipeline
All earth observation imagery, analysis, and outputs unless otherwise stated were hosted in the Google Earth Engine (GEE) environment and developed with the Earth Engine Python API in Python v3.10. To set up a local conda environment use the scripts/environment.yml file. The user must have Google Cloud Storage (GCS) and Google Earth Engine (GEE) accounts. The pipeline, at this scale, will incur some processing and storage fees, although Google offers a free trial to all new users and the total cost of the high-resolution wall-to-wall predictions is nominal (~$20 for one season).
The scripts needed to perform the pipeline are located in the scripts folder.
The files contained in the scripts/helpers directory will be called by various subsequent scripts and do not to be run interactively by the user.
Follow the script in the order described below. The user should pause after running each script and confirm that all outputs were created and loaded to GCS before continuing the pipeline; for some steps this may take hours to days depending on processing speed.
Google Cloud Storage and Earth Engine set-up
Users should specify the names of the bucket and asset project that were chosen during set up of their GCS and GEE environments in the Objects section of scripts/helpers/maize_pipeline_0_workspace.py.
Pipeline set-up
In scripts/pipeline_setup, you will find the following scripts to perform data preparation of inputs into model building and prediction.
maize_pipeline_1_clean_training_data.py - Cleans and merges all available crop label and yield data for model training and validation
maize_pipeline_2_dwnld_data_training.py - Downloads satellite-derived and auxiliary features at training data points for model building
maize_pipeline_3_dwnld_data_inference.py - Downloads satellite-derived and auxiliary features at every 10 m pixel in Rwanda on a district-wise basis for prediction
Land cover and maize classification
In scripts/maize_classification, you will find the following scripts to perform model building, prediction, and post-processing for the classificaton of land cover type and maize cover.
maize_classifier_1_feature_selection.py - Selects features subset for land cover classification with mutual information score or variable importance
maize_classifier_2_build_model.py - Builds gradient boosted tree model for land cover classification from training data
maize_classifier_3_prediction.py - Applies model for land cover classification to every 10 m pixel in Rwanda by season and district
maize_classifier_4_postprocess.py - Mosaics district-wise predictions and normalizes maize cover predictions to national agricultural statistics
Maize yield
In scripts/maize_yield, you will find the following scripts to perform modeling building, prediction, and post-processing for maize yield estimation.
maize_yield_1_build_model.py - Builds gradient boosted tree model and performs bias correction for maize yield estimation from training data
maize_yield_2_prediction.py - Applies model for maize yield estimation to every 10 m pixel in Rwanda by season and district
maize_yield_3_postprocess.py - Mosaics district-wise predictions and normalizes maize yield predictions to national agricultural statistics
If you are running the entire pipeline with refreshed training data and model building, run each of these scripts, in order. By default, the script will run all A and B seasons from 2019A to current. Otherwise, if you just wish to re-run or update seasonal predictions from the existing classification or yield model run maize_pipeline_3_dwnld_data_inference.py to download the seasonal feature data across Rwanda and maize_classifier_3_prediction.pyand maize_classifier_4_postprocess.py for classification predictions or maize_yield_2_prediction.py and maize_yield_3_postprocess.py for yield predictions, making sure to specify which season(s) are of interest in each script. However to do this, you also need to have a copy of the previously built models in your GCS (provided at data/models).
3) Input data into machine learning pipeline
A description of datasets that must be sourced outside of the GEE platform is provided below. When available, the primary data source is also included in the directory data/baselayers. All other data, including Sentinel-2 imagery, auxiliary data, and other existing global land cover classificaiton products are hosted on GEE and called by the scripts directly. All datasets last accessed on 12 March 2024.
Administrative and geological boundaries
World Countries - Downloaded from The World Bank Official Boundaries and included here at data/baselayers/World_Countries.
Rwanda district boundaries - Downloaded from The World Bank Rwanda Admin Boundaries And Villages and included here at data/baselayers/WB_NISR_2018. This should be loaded into a FeatureCollection GEE asset named districts_fc for use in the pipeline.
Rwanda agro-ecological zones - Downloaded from Nzeyimana, Hartemink & Geissen (2016) and included here at data/baselayers/MINAGRI_AEZ_1980. This should be loaded into a FeatureCollection GEE asset named aez_rwanda for use in the pipeline.
Global land cover classification product
Microsoft/Impact Observatory LULC - Although the 10m Annual Land Use Land Cover (9-class) V1 product contains data from 2017-2022, only the LULC map from the year 2021 was used, provided here at data/baselayers/impactobs_lulc_rwa_2021.tif. This should be loaded into an ImageCollection GEE asset named impact_obs_lulc for use in the pipeline.
(The others - Dynamic World and ESA's WorldCover - are hosted on GEE directly.)
Land cover labels and maize yield crop cuttings
One Acre Fund - Contact authors to request access as this dataset is not hosted publicly.
RTI International - The original source of this data (Radiant MLHub) has been discontinued, but users may be able to access it via Source Cooperative. The data is also included here at data/baselayers/rti_rwanda_crop_type_labels.
Crop Harvest - Downloaded from Tseng et al. (2021, v13) and included here at data/baselayers/CropHarvest. These data points were ultimately not used in the training data, but are provided here for others that may find this dataset useful in their context.
Rwanda national agricultural surveys
National Institute of Statisitcs Rwanda (NISR) - Downloaded from NISR Seasonal Agricultural Survey and existing seasons included here at data/baselayers/NISR_Seasonal_Ag_Surveys. For each subsequent season, the user will have to download the spreadsheet of survey results from the NISR webpage (linked) and add the respective season to the get_nisr_data function in the helpers/maize_pipeline_0_helpers_postprocess.py script to clean and read in the data for use in the pipeline.
创建时间:
2024-05-02



