CY-Bench: A comprehensive benchmark dataset for subnational crop yield forecasting
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11502142
下载链接
链接失效反馈官方服务:
资源简介:
CY-Bench: A comprehensive benchmark dataset for sub-national crop yield forecasting
Overview
CY-Bench is a dataset and benchmark for subnational crop yield forecasting, with coverage of major crop growing countries of the world for maize and wheat. By subnational, we mean the administrative level where yield statistics are published. When statistics are available for multiple levels, we pick the highest resolution. The dataset combines sub-national yield statistics with relevant predictors, such as growing-season weather indicators, remote sensing indicators, evapotranspiration, soil moisture indicators, and static soil properties. CY-Bench has been designed and curated by agricultural experts, climate scientists, and machine learning researchers from the AgML Community, with the aim of facilitating model intercomparison across the diverse agricultural systems around the globe in conditions as close as possible to real-world operationalization. Ultimately, by lowering the barrier to entry for ML researchers in this crucial application area, CY-Bench will facilitate the development of improved crop forecasting tools that can be used to support decision-makers in food security planning worldwide.
* Crops : Wheat & Maize* Spatial Coverage : Wheat (29 countries), Maize (38). See CY-Bench Summary for the list of countries.* Temporal Coverage : Varies. See CY-Bench Summary.
Data
Data format
The benchmark data is organized as a collection of CSV files (with the exception of location information, see below), with each file representing a specific category of variable for a particular country. Each CSV file is named according to the category and the country it pertains to, facilitating easy identification and retrieval. The data within each CSV file is structured in tabular format, where rows represent observations and columns represent different predictors related to a category of variable.
Data content
All data files are provided as .csv.
Data
Description
Variables (units)
Temporal Resolution
Data Source (Reference)
crop_calendar
Start and end of growing season
sos (day of the year), eos (day of the year)
Static
World Cereal (Franch et al, 2022)
fpar
fraction of absorbed photosynthetically active radiation
fpar (%)
Dekadal (3 times a month; 1-10, 11-20, 21-31)
European Commission's Joint Research Centre (EC-JRC, 2024)
ndvi
normalized difference vegetation index
-
approximately weekly
MOD09CMG (Vermote, 2015)
meteo
temperature, precipitation (prec), radiation, potential evapotranspiration (et0), climatic water balance (= prec - et0)
tmin (C), tmax (C), tavg (C), prec (mm0, et0 (mm), cwb (mm), rad (J m-2 day-1)
daily
AgERA5 (Boogaard et al, 2022), FAO-AQUASTAT for et0 (FAO-AQUASTAT, 2024)
soil_moisture
surface soil moisture, rootzone soil moisture
ssm (kg m-2), rsm (kg m-2)
daily
GLDAS (Rodell et al, 2004)
soil
available water capacity, bulk density, drainage class
awc (c m-1), bulk_density (kg dm-3), drainage class (category)
static
WISE Soil database (Batjes, 2016)
yield
end-of-season yield
yield (t ha-1)
yearly
Various country or region specific sources (see crop_statistics_... in https://github.com/BigDataWUR/AgML-CY-Bench/tree/main/data_preparation)
Folder structure
cybench-data: The CY-Bench dataset has been structure at first level by crop type and subsequently by country. For each country, the folder name follows the ISO 3166-1 alpha-2 two-character code. A separate .csv is available for each predictor data and crop calendar as shown below. The csv files are named to reflect the corresponding country and crop type e.g. **variable_croptype_country.csv**.```CY-Bench│└─── maize│ ││ └─── AO│ │ -- crop_calendar_maize_AO.csv│ │ -- fpar_maize_AO.csv│ │ -- meteo_maize_AO.csv│ │ -- ndvi_maize_AO.csv│ │ -- soil_maize_AO.csv│ │ -- soil_moisture_maize_AO.csv│ │ -- yield_maize_AO.csv│ │ │ └─── AR│ -- crop_calendar_maize_AR.csv│ -- fpar_maize_AR.csv│ -- ...│ └─── wheat│ ││ └─── AR│ │ -- crop_calendar_wheat_AR.csv│ │ -- fpar_wheat_AR.csv│ │ ...```
Example : CSV data content for maize in country X
```X└─── crop_calendar_maize_X.csv│ -- crop_name (name of the crop)│ -- adm_id (unique identifier for a subnational unit)│ -- sos (start of crop season)│ -- eos (end of crop season)│ └─── fpar_maize_X.csv│ -- crop_name│ -- adm_id │ -- date (in the format YYYYMMdd)│ -- fpar│ └─── meteo_maize_X.csv│ -- crop_name│ -- adm_id │ -- date (in the format YYYYMMdd)
│ -- tmin (minimum temperature)│ -- tmax (maximum temperature)│ -- prec (precipitation)│ -- rad (radiation)│ -- tavg (average temperature)│ -- et0 (evapotranspiration)│ -- cwb (crop water balance)│ └─── ndvi_maize_X.csv│ -- crop_name│ -- adm_id│ -- date (in the format YYYYMMdd)│ -- ndvi │ └─── soil_maize_X.csv│ -- crop_name│ -- adm_id│ -- awc (available water capacity)│ -- bulk_density│ -- drainage_class│ └─── soil_moisture_maize_X.csv│ -- crop_name│ -- adm_id│ -- date (in the format YYYYMMdd)│ -- ssm (surface soil moisture)│ -- rsm ()│ └─── yield_maize_X.csv│ -- crop_name│ -- country_code│ -- adm_id│ -- harvest_year│ -- yield│ -- harvest_area│ -- production
centroids.zip and polygons.zip include shapes or geometries as centroids ( x and y coordinates) and polygons (multipolygons) of administrative regions respectively. They are organized as follows:
centroids
│ └─── AO│ │ -- AO.cpg│ │ -- AO.dbf│ │ -- AO.prj│ │ -- AO.shp│ │ -- AO.shx│ └─── AR│ │ -- AR.cpg│ │ -- AR.dbf│ │ -- AR.prj│ │ -- AR.shp│ │ -- AR.shx
...
polygons
│ └─── AO│ │ -- AO.cpg│ │ -- AO.dbf│ │ -- AO.prj│ │ -- AO.shp│ │ -- AO.shx│ └─── AR│ │ -- AR.cpg│ │ -- AR.dbf│ │ -- AR.prj│ │ -- AR.shp│ │ -- AR.shx
...
Data access
The full dataset can be downloaded directly from Zenodo or using the ```zenodo_get``` library
License and citation
We kindly ask all users of CY-Bench to properly respect licensing and citation conditions of the datasets included.
Version Notes
1.0 is the dataset submitted to NeurIPS Datasets and Benchmarks Track. The paper and discussions are here: https://openreview.net/forum?id=jkJDNG468g#discussion
1.1 and 1.2 fix some issues with column names and mismatches in adm_id between yield data and input data.
1.3 includes location information in the form of centroids and polygons of admin regions.
1.4 updates the fpar data for 2023. fpar data was incomplete for 2023 in earlier versions (due to unavailability in the data source itself).
1.5 fixes an issue in crop calendar
1.6 fixes an issue in ndvi time series
1.7 updates storage precision to 3 decimal places to reduce data size
1.8 filter out invalid yield values
创建时间:
2025-03-11



