A Multisource Grapevine Phenology Dataset for Smart Farming and AI Modeling
收藏DataCite Commons2026-05-03 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.17930722
下载链接
链接失效反馈官方服务:
资源简介:
Description
Artificial Intelligence and Machine Learning rely on large, high-quality datasets for accurate and robust models, yet data scarcity remains a major challenge—especially in smart farming. Agricultural data are highly diverse and heterogeneous, complicating model development. Phenology modeling, a key application, studies how plant biological events relate to climate and seasons. Accurate phenology models improve crop quality, support climate adaptation, and guide decisions such as pesticide use and harvesting, enhancing environmental and economic sustainability.This study introduces a georeferenced dataset for Machine Learning-based grapevine phenology prediction across 3 Protected Designations of Origin in Arag’on, Spain. Developed by a multidisciplinary team, the dataset combines 9 datasets from 8 sources—including meteorological time series, field phenology observations, and Copernicus Sentinel-2 multispectral imagery—covering the period 2016–2022. It supports both physical and ML-based phenology modeling and facilitates knowledge extraction in agronomy and plant biology. Its relevance lies in its comprehensive scope, the inclusion of 9 phenological stages, and a rigorous methodology ensuring reproducibility. This framework enables the creation of similar datasets for otherregions or crops, advancing smart farming through scalable, data-driven solutions. We further anticipate its potential contribution to developing Foundation Models as well as to the creation.
Dataset Structure
The dataset contains 2 main CSV files and 1 supporting folder:
DIF Description.pdf- Description of the data set.
DIF phenologicaletages.csv: it contains the links from the phenologystageid values in the file “DIF GrapevinePehologyDataset.csv: with the correspoding BBCH values.
DIF GrapevinePehologyDataset.csv: it contains the dataset used to train the models we presented.
Metadata: this folder contains the JSON files describing the dataset and its content:
DIF_DataSetDescription.json: the description contained if “DIF Description.pdf” but in JSON format.
DIF GrapevinePehologyDataset.json: contains the dataset presented in this paper which was used to train the models we presented in [51, 50]. We describe the content of the file in next paragraphs.
Intended Use
The goal of this dataset is to enable the development of models for predicting grapevine phenology in the three Protected Designations of Origin in Aragón (Spain), using data from field observations, meteorological stations, and NDVI derived from Copernicus Sentinel-2 multispectral imagery. Additionally, it supports the calibration of physical models for these regions by including the calculation of cold and heat accumulation indices. These calculations are performed using the traditional start dates of January 1 and February of the corresponding year, as well as from the date when plants enter dormancy: the first autumn day when the maximum temperature does not exceed 10 °C.
Access Conditions: This dataset is publicly available under the terms of the Creative Commons Attribution 4.0 International license.
Specifications Table
Subject
Smart farming
Specific subject area
The dataset is based on data from 3 Protected Designations of Origin—Calatayud, Cariñena and Campo de Borja, —in Aragón, northeastern Spain. Built by merging 9 georeferenced time-series datasets from 8 data sources considering the period from 2016 to 2022. It includes meteorological data (measurements, estimates, and forecasts), qualitative field phenology observations, and Copernicus Sentinel-2 multispectral imagery.
Type of data
AnalyzedFilteredProcessedMulti-source
Data collection
Data merged in the dataset is obtained from 9 georeferenced datatsets obtained from 8 data sources. The datasets considered are:
· Red FARA phenological registry [1]: this dataset has restricted access. It provides phenology field observations on the control parcels.
· Spanish Cadastral Registry (Catastro) [2]: it is used to normalize Red FARA records and to obtain the NDVI of the control parcels from Copernicus Sentinel 2 images.
· Aragón Open Data Common Agrarian Policy Registry (CAP) [3]: together with the Catastro data is used to normalize of the Red FARA records.
· SIAR [4] and Grapevine [5] climatic station networks provide meteorological data.
· ERA5 real climatic estimations [6] and ECMWF IFS forecast [7] data used to replace failures in climatic data and forecast data to perform predictions.
· Copernicus Sentinel 2 multispectral images [8]: these images are used to determine the NDVI of the control parcels used to create the dataset.
For accessing these dataset we used available APIs. All they are public and provide open access. The 2 exceptions were Red FARA which has restricted access, and ERA5 data which was accessed using openMeteo API [8] which eased our work. The access to the data and the transformations performed in them were coded in Python. A deep explanation of the transformation performed can be obtained in [9].
Data source location
Country: Spain.Region: Aragón.Protected Designation of Origin: Calatayud, Campo de Borja, Cariñena.Coordinates: Parallelepiped defined by points (41.98107, −2.177578) and (41.166320, −0.922575) in WGS84 coordinates.
Data accessibility
Repository name: Zenodo
Data identification number: 10.5281/zenodo.17930723 Direct URL to data: https://doi.org/10.5281/zenodo.17930723
References
[1]
Government of Aragón. Red FARA Home Page. Last access: January 10, 2026. 2026. url: http://web.redfara.es.
[2]
Spanish Treasury. Spanish Cadastral Registry Electronic Home Page. Last access: January 10, 2026. 2026. url: https://www.sedecatastro.gob.es/.
[3]
Government of Aragón. Aragón Open Data Home Page. Last access: January 10, 2026. 2026. url: https://opendata.aragon.es.
[4]
Spanish Ministry of Agriculture, Fisheries and Food. Agro-climatic Information System for Irrigation (SIAR) Home Page. Last access: January 10, 2026. 2026. url: https://eportal.mapa.gob.es//websiar/Inicio.aspx .
[5]
Grapevine Project Consortium. Grapevine Project Home Page. Grant agreement ID: 863463. https://grapevine-project.eu (Last access: August 8, 2024), https://web.archive.org/web/20230922054033/https://grapevine- project.eu (Last access: January 10, 2026), https://www.egi.eu/case-study/grapevine (Last access: January 10, 2026). 2022.
[6]
Copernicus Climate Change Service (C3S). ERA5 hourly data on single levels from 1959 to present. Last access: January 10, 2026. 2023. url: https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels?tab=overview .
[7]
European Centre for Medium-Range Weather Forecasts (ECMWF). ECMWF Open Data. Last accessed January 10, 2026. 2026. url: https://www.ecmwf.int/en/forecasts/datasets/open-data .
[8]
F. Gascon et al. “Copernicus Sentinel-2 mission: products, algorithms and Cal/Val”. In: Earth Observing Systems XIX. Ed. by James J. Butler, Xiaoxiong (Jack) Xiong, and Xingfa Gu.SPIE, Sept. 2014, pp. 1–9. doi:10.1117/12.2062260. url: http://dx.doi.org/10.1117/12.2062260.
[9]
Francisco Jos´e Lacueva-P´erez et al. “Developing machine learning models from multisourced real-world datasets to enhance smart-farming practices”. In: Computers and Electronics in Agriculture 231 (Apr. 2025), p. 110018. issn: 0168-1699. doi: 10.1016/j.compag.2025.110018. url: http://dx.doi.org/10.1016/j.compag.2025.110018 .
File “DIF_GrapevinePehologyDataset.csv” Description
File DIF_GrapevinePehologyDataset.csv contains the dataset presented in this paper. Each of the records represents the data considered for a given parcel (vineyard) in each date. The following table provides a description of the fields contained in the dataset. For clarity, we simplified the table by using an abbreviated notation for the field names; specifically, for some field names we include an asterisk (“*”) with the name followed by a couple of numbers in brackets (“[…]”) that describe the range of integer values that can replace the “*” in the dataset; for example, we did this in fields which provide values of the given variable data for the n days before (days_after ) and after (days_adelante). For clarity, we provide here some examples:
· tmed_min *_days_after [1,13]: this name represents that the dataset contains all the fields tmed_min 1_days_after , tmed_min 2_days_after , ..., tmed_min 13_days_after , which represent, for the given field, the minimum temperature for each of the n days before the date of the record.
· wind_NE *_days_after [1,6]: this name represents that the dataset contains all the fields wind_NE 1_days_after, t wind_NE 2_days_after, ..., wind_NE 6_days_after, which represent, for the given field, the wind_NE index for each of the n days after the date of the record.
· gdd_4.5_t0_Tbase_sum *_weeks_before [1,2]: this name represents that the dataset contains all the fields gdd_4.5_t0_Tbase_sum 1_weeks_before and gdd_4.5_t0_Tbase_sum 2_weeks_before, which represent, for the given field, the GDD calculated using the base temperature 4.5º C and starting to accumulate at the beginning of the session.
Moreover, we use “|” to denote choices (expressed within brackets “[…]”), which can represent several attributes. For example, “rad_[min|MAX|mean]” actually represents (in a condensed way) 3 different variables: “rad_min”, “rad_max” and “rad_mean”. Other notations can be interpreted similarly. The full list of variable names is shown in Appendix A.
Field Name (abbreviated notation)
Description
phenologystageid
Id of the phenological stage of the parcel on the given date. See file “DIF phenologicalstages.csv”.
variety
Grapevine variety: Cabernet Sauvignon, Chardonnay, Garnacha, Mazuela, Syrach, Tempranillo.
codigo
Id of the parcel in the Spanish Cadastral Registry.
longitude
Longitude of the centroid of the parcel.
latitude
Latitude of the centroid of the parcel.
altitudeASL
AltitudeASL of the centroid of the parcel.
PDO_id
Id of the Protected Designation of Origin (PDO): Calatayud, Carinena and Campo de Borja.
date
The date of the record.
station
The name of the climatic station whose data are considered.
season
The season to which the record belongs.
day
The DOY (day of the year).
"PDO_Borja", "PDO_Calatayud", "PDO_Carinena", "PDO_Somontano"
Boolean values which are true when the record corresponds to the given PDO.
"variety_CABERNET SAUVIGNON", "variety_CHARDONNAY", "variety_GARNACHA", "variety_MAZUELA", "variety_SYRACH", "variety_TEMPRANILLO"
Boolean values which are true when the record corresponds to a field with the given variety.
min, MAX, mean, std, medayn, diff
Values derived from the NDVI indexes calculated for each parcel from the Copernicus Sentinel 2 multispectral images. They represent the minimum, maximum, average, standard deviation, medayn and difference values.
tmed_[min|MAX|mean]
[Minimum|Maximum|Mean] temperature for the given date (ºC).
tmed_[min|MAX|mean] *_days_after [1,13]
[Minimum|Maximum|Mean] temperatures for the 13 days before the given date.
tmed_[min|MAX|mean] *_days_after [1,6]
[Minimum|Maximum|Mean] temperatures for the 6 days following the given date.
rad_[min|MAX|mean]
[Minimum|Maximum|Mean] radaytion for the given date (W/m²).
rad_[min|MAX|mean] *_days_after [1,13]
[Minimum|Maximum|Mean] radaytion for the 13 days before the given date.
rad_[min|MAX|mean] *_days_after [1,6]
[Minimum|Maximum|Mean] radaytion for the 6 days following the given date.
hr_ mean
Average air relative humidity for the given date (%).
hr_mean *_days_after [1,13]
Average air relative humidity radaytion for the 13 days before the given date.
hr_mean *_days_after [1,6]
Average air relative humidity radaytion for the 6 days following the given date.
wind_[N|NE|E|SE|S|SW|W|NW] *_days_after [1,13]
Wind index for the North, North-East, East, South-East, South, South-West, West, North-West area for the 13 days before the given date.
ind_[N|NE|E|SE|S|SW|W|NW] *_days_after [1,6]
Wind index for the North, North-East, East, South-East, South, South-West, West, North-West area for the 6 days following the given date.
gdd_[4.5|10.0]_[t0|1|2]_[TBase|TbaseMAX]_sum
GDD heat accumulation index, calculated with a base temperature of 4.5ºC or 10.0 ºC; accumulated since the beginning of the season (t0), January the 1st of the date’s year (1) or February the 1st (2); considering a maximum temperature threshold over which the heat accumulation stopped (TbaseMAX, 35ºC) or not (TBase); and considering the daily contribution calculated considering the min temperature and max temperature of the given day (sum).
gdd_[4.5|10.0]_[t0|1|2]_[TBase|TbaseMAX]_sum *_weeks_before [1|2]
GDD heat accumulation index, calculated with a base temperature of 4.5ºC or 10.0 ºC; accumulated since the beginning of the season (t0), January the 1st of the date’s year (1) or February the 1st (2); considering a maximum temperature threshold over which the heat accumulation stopped (TbaseMAX, 35ºC) or not (TBase); and considering the daily contribution calculated considering the min temperature and max temperature of the given day (sum), for the 2 weeks previous to the given day.
gdd_[4.5|10.0]_[t0|1|2]_[TBase|TbaseMAX]_sum * 1_weeks_after
GDD heat accumulation index, calculated with a base temperature of 4.5ºC or 10.0 ºC; accumulated since the beginning of the season (t0), January the 1st of the date’s year (1) or February the 1st (2); considering a maximum temperature threshold over which the heat accumulation stopped (TbaseMAX, 35ºC) or not (TBase); and considering the daily contribution calculated considering the min temperature and max temperature of the given day (sum), for the next week to the given day.
ChillingDD_7.0_[t0|1|2]_[TBase|Tbasemin]_sum
Richardson cold accumulation index, calculated with a base temperature of 7.0º C; accumulated since the beginning of the season (t0), January the 1st of the date’s year (1) or February the 1st (2); considering a minimum temperature threshold above which the cold accumulation stopped (Tbasemin, -7ºC) or not (TBase); and considering the daily contribution calculated considering the min temperature and max temperature of the given day (sum).
ChillingDD_7.0_[t0|1|2]_[TBase|Tbasemin]_sum *_weeks_before [1|2]
Richardson cold accumulation index, calculated with a base temperature of 7.0º C; accumulated since the beginning of the season (t0), January the 1st of the date’s year (1) or February the 1st (2); considering a minimum temperature threshold above which the cold accumulation stopped (Tbasemin, -7ºC) or not (TBase); and considering the daily contribution calculated considering the min temperature and max temperature of the given day (sum), for the 2 weeks previous to the given day.
ChillingDD_7.0_[t0|1|2]_[TBase|Tbasemin]_sum * 1_weeks_after
Richardson cold accumulation index, calculated with a base temperature of 7.0º C; accumulated since the beginning of the season (t0), January the 1st of the date’s year (1) or February the 1st (2); considering a minimum temperature threshold above which the cold accumulation stopped (Tbasemin, -7ºC) or not (TBase); and considering the daily contribution calculated considering the min temperature and max temperature of the given day (sum), for the next week to the given day.
ChillingDD_7.0_[t0|1|2]_ Utah _sum
Utah cold accumulation index, calculated with a base temperature of 7.0º C; accumulated since the beginning of the season (t0), January the 1st of the date’s year (1) or February the 1st (2); considering a minimum temperature threshold above which the cold accumulation stopped (Tbasemin, -7ºC) or not (TBase); and considering the daily contribution calculated considering the min temperature and max temperature of the given day (sum).
ChillingDD_7.0_[t0|1|2]_ Utah _sum *_weeks_before [1|2]
Richardson cold accumulation index, calculated with a base temperature of 7.0º C; accumulated since the beginning of the season (t0), January the 1st of the date’s year (1) or February the 1st (2); considering a minimum temperature threshold above which the cold accumulation stopped (Tbasemin, -7ºC) or not (TBase); and considering the daily contribution calculated considering the min temperature and max temperature of the given day (sum), for the 2 weeks previous to the given day.
ChillingDD_7.0_[t0|1|2]_ Utah _sum * 1_weeks_after
Richardson cold accumulation index, calculated with a base temperature of 7.0º C; accumulated since the beginning of the season (t0), January the 1st of the date’s year (1) or February the 1st (2); considering a minimum temperature threshold above which the cold accumulation stopped (Tbasemin, -7ºC) or not (TBase); and considering the daily contribution calculated considering the min temperature and max temperature of the given day (sum), for the next week to the given day.
rad_sum
Accumulated radaytion since the beginning of the season until the given date.
rad_sum *_weeks_before [1|2]
Accumulated radaytion since the beginning of the season until 1 or 2 weeks before the given date.
rad_sum 1_weeks_after
Accumulated radaytion since the beginning of the season until the next week after the given date.
precip_sum
Accumulated precipitation since the beginning of the season until the given date.
precip_sum *_weeks_before [1|2]
Accumulated precipitation since the beginning of the season until 1 or 2 weeks before the given date.
precip_sum 1_weeks_after
Accumulated precipitation since the beginning of the season until the next week after the given date.
winkler_[4.5|10.0]_[t0|1|2]_[TBase|TbaseMAX]_sum
Winkler heat accumulation index, calculated with a base temperature of 4.5ºC or 10.0 ºC; accumulated since the beginning of the season (t0), January the 1st of the date’s year (1) or February the 1st (2); considering a maximum temperature threshold over which the heat accumulation stopped (TbaseMAX, 35ºC) or not (TBase); and considering the daily contribution calculated considering the min temperature and max temperature of the given day (sum).
winkler_[4.5|10.0]_[t0|1|2]_[TBase|TbaseMAX]_sum *_weeks_before [1|2]
Winkler heat accumulation index, calculated with a base temperature of 4.5ºC or 10.0 ºC; accumulated since the beginning of the season (t0), January the 1st of the date’s year (1) or February the 1st (2); considering a maximum temperature threshold over which the heat accumulation stopped (TbaseMAX, 35ºC) or not (TBase); and considering the daily contribution calculated considering the min temperature and max temperature of the given day (sum), for the 2 weeks previous to the given day.
winkler_[4.5|10.0]_[t0|1|2]_[TBase|TbaseMAX]_sum * 1_weeks_after
Winkler heat accumulation index, calculated with a base temperature of 4.5ºC or 10.0 ºC; accumulated since the beginning of the season (t0), January the 1st of the date’s year (1) or February the 1st (2); considering a maximum temperature threshold over which the heat accumulation stopped (TbaseMAX, 35ºC) or not (TBase); and considering the daily contribution calculated considering the min temperature and max temperature of the given day (sum), for the next week after the given day.
The GDD, Winkler and Chilling (Richardson and Utah) indexes are also calculated considering the contributions of the time units (periods) to the daily contribution. These fields (or columns of the file) have the same naming schema as their counterparts based on daily calculations but with the “cumm” suffix.
File “DIF phenologicalstages.csv” Description
This file contains a description of the different types of phenological stages considered. The fields are:
itainnovaid: this is an identifier of the phenological stage.
bbch: the number of stage in the BBCH phenological stage.
Descripción BBCH: this is a textual description of the previous BBCH phenological stage.
The contents of the file are as follows:
itainnovaid
bbch
Descripción BBCH
0
0
Winter dormancy or resting period
3
63
Early flowering: 30% of flowerhoods fallen
1
11
First leaf unfolded and spread away from shoot
2
15
5 leaves unfolded
4
65
Full flowering: 50% of flowerhoods fallen
6
71
Fruit set: young fruits begin to swell, remains of flowers
5
68
80% of flowerhoods fallen
7
75
50% of fruits have reached final size or fruit has reached 50% of final size
8
77
70% of fruits have reached final size or fruit has reached 70% of final size
9
81
Beginning of ripening or fruit colouration
提供机构:
Zenodo
创建时间:
2026-04-08



