five

SD4EO: AI-based synthetic satellite multispectral agricultural textures in Spain (Oct 2017 - Sep 2018)

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11220859
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset has been created as part of the deliverables for ESA’s SD4EO project. It consists of textures generated using a multispectral variant of a still unpublished high-order statistical constraint synthesis method for each of the following crop types:  Barley.  Wheat.  Other grain leguminous.  Peas.  Fallow & Bare soil.  Vetch.  Alfalfa.  Sunflower.  Oats. The initial data was sampled from satellite images, specifically from Copernicus’ Sentinel-1 and Sentinel-2 satellites. The images were acquired over a period from October 2017 to September 2018 on the central-east region of northern Spain (Castile and León and Catalonia). From these images, the corresponding crops were extracted and used as samples for assembling  large puzzles that have been applied as input reference images to generate the synthetic images that make up this dataset. The datasets of assembled crop field "puzzles" used as reference images combine the largest crop areas to create a square multispectral texture of the largest possible size that is a power of 2 (or nearly a power of 2). Each base image combines data from all available Sentinel-2 satellite passes for the same month and a previous monthly composition from Sentinel-1. Due to cloud masks influence, the shape and number of crops vary for each time sample, preventing the reuse of element disposition in the “puzzles” across different months. Therefore, we have a base image (puzzle) for each month and crop type, with a size dependent on the number and area of crops not covered by clouds. These base image sizes range between 256, 384, 512, 768, 1024, 1536, and 2048 pixels per side, influenced by weather conditions and crop type each year season. In this dataset, the synthetic texture sizes match the corresponding base image sizes to facilitate debugging the method implementation and enable subsequent comparisons. For crops with a base image size of 1536 pixels or larger, the generated synthetic images have been reduced to half their size to reduce computational costs and RAM requirements, thereby completing the synthesis faster. Consequently, there remains some diversity in file sizes, generally smaller for crop types with less cultivated area. Additionally, to increase the amount of available data, six variants have been synthesized from each base multispectral image. This number can be arbitrarily increased, as initialization with noise (random numbers) ensures the distinction among the generated data. File names are structured as follows: Prefix "HO" indicating the synthesis method The crop type name: Barley Wheat OtherGrainLeguminous Peas FallowAndBareSoil Vetch Alfalfa Sunflower Oats Year/Month/01 (representing the start of the month period) Side length of the multispectral texture in pixels (based on the highest precision instrument of Sentinel-2: 10m x 10m) Number of the synthesis variant The generation parameters for all images include: Normalized and weighted bands (VH band influence increased by a factor of 3 compared to others) 4 levels of depth in the Steerable pyramid 6 orientations in the Steerable pyramid 14 joint statistics of the wavelet coefficients corresponding to basis functions at adjacent spatial locations, orientations, and scales. This parameter is crucial for capturing local dependencies between wavelet coefficients, essential for the visual perception of texture. 30 iterations A significant effort has been made to stabilize the algorithm, and to eliminate artifacts in the generated textures, resulting in much more robust outcomes. However, in rare cases, the initial white noise distribution can be statistically unfavorable, leading to instabilities. Files have been left as generated, without correcting these effects, to make them visible despite their low frequency. Specifically, among the 657 generated multispectral textures, this phenomenon has occurred prominently in only two and is relatively noticeable in another two, leaving the rest free of this effect (affecting less than 1% of the syntheses). Thus, the following files can be considered partially failed syntheses: HO_Alfalfa_20180801_768_1.nc HO_FallowAndBareSoil_20180101_768_3.nc HO_OtherGrainLeguminous_20171201_256_4.nc HO_Vetch_20180301_384_3.nc Files are encoded in the standardized net4CDF format [link], each containing a single xarray with metadata corresponding to a 3D array with the synthesized texture of the indicated crop type and satellite passes for the regions of Castilla y León and Catalonia for the corresponding monthly period. The most important data structure is the 3D array, where the first two dimensions correspond to the pixel extent indicated in the file name as square textures ('x' and 'y' labels in the xarray). The third dimension denotes the spectral band of the satellite, ordered by constellation and pixel size: 'B02' 10m (Sentinel-2) 'B03' 10m (Sentinel-2) 'B04' 10m (Sentinel-2) 'B08' 10m (Sentinel-2) 'B05' originally 20m, resampled to 10m (Sentinel-2) 'B06' originally 20m, resampled to 10m (Sentinel-2) 'B07' originally 20m, resampled to 10m (Sentinel-2) 'B11' originally 20m, resampled to 10m (Sentinel-2) 'B12' originally 20m, resampled to 10m (Sentinel-2) 'B8A' originally 20m, resampled to 10m (Sentinel-2) 'VH' also resampled to 10m (Sentinel-1) The original dynamic range is preserved in all bands, and they have been synthesized together using our multispectral algorithm variant. The new band combination may result in slightly unusual values in vegetation indices since restrictions were not considered in their transformed space, but in the latent space of the decorrelated Steerable pyramid. Additionally, the following metadata are stored as xarray attributes: "long_name": corresponding to the crop type name "date": the period of the original data used as the base image for synthesis "dataset": denotes the combination of the initial Castilla y León dataset and the extended 6 Tiles from Catalonia "synthetic_method": corresponds to the high-order constrained method "max_visible_value": a reference value to maintain the same dynamic range when comparing with base images, avoiding distortions in color space and contrast A total of:                   9 types of crops x 12 months x 6 variants = 648 synthetized multispectral textures occupying 34.5GB, have been organized and uploaded into 9 ZIP files (one per crop type) on the Zenodo website for distribution under Creative Commons Attribution 4.0 International license. The SD4EO Project is funded by the ESA’s FutureEO programme under contract no. 4000142334/23/I-DT and supervised by ESA Φ-lab.
创建时间:
2024-10-18
二维码
社区交流群
二维码
科研交流群
商业服务