Training and validation data for artificial neural networks using three-dimensional partial convolutions to fill gaps in satellite image time series
收藏Mendeley Data2024-05-10 更新2024-06-30 收录
下载链接:
https://zenodo.org/records/6838652
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains training and validation data for artificial neural networks using three-dimensional partial convolutions to fill gaps in satellite image time series. The data have been derived from Sentinel-5P total column carbon monoxide observations, using the offline processing stream. Preprocessing The following operations have been applied on the original S5P imagery: Images have been resampled to 0.1 by 0.1 degree spatial resolution Pixels with quality assessment value less than or equal to 0.5 have been set to NA Images have been aggregated by day of observation Images have been cropped to -60 to 60 degrees latitude Images have been devided into spatiotemporal blocks of size 128 x 128 pixels and 16 days Imagery has been recorded between 2021-01-01 and 2021-11-25. Notice that both the training and the validation blocks have been randomly sampled from all available blocks. Data Format and Naming Conventions Input and output data blocks are stored as GeoTIFF files, where bands represent time. Notice the following file naming conventions: Files starting with X represent input measurements for training, where artificial gaps have been added. Files starting with Y represent true measurements without artificially added gaps (but still containing gaps in many cases). Binary masks of input data where all pixels with valid measurements are 1 and others 0 are stored in files whose name starts with MASK Files starting with VALMASK contain a binary mask where only pixels that are available in Y but not in X are 1. The latter is used for validation on artificially removed pixels only. Numbers in filenames encode spatial and temporal block indexes. In addition, the dataset contains prediction of the validation blocks from different models in the `predictions` directory. The subfolders contain output from different models: mean refers to simple block-wise mean predictions. timeseries refers to simple linear time series interpolation. gapfill refers to the method proposed in [1]. stmra refers to the method proposed in [2]. STpconv refers to predictions passed on an artificial neural netowork with three-dimensional partial convolutions. References [1] Gerber, F., de Jong, R., Schaepman, M. E., Schaepman-Strub, G., & Furrer, R. (2018). Predicting missing values in spatio-temporal remote sensing data. IEEE Transactions on Geoscience and Remote Sensing, 56(5), 2841-2853. [2] Appel, M., & Pebesma, E. (2020). Spatiotemporal multi-resolution approximations for analyzing global environmental data. Spatial Statistics, 38, 100465.
本数据集包含用于基于三维部分卷积填充卫星图像时间序列空白的人工神经网络的训练与验证数据。该数据源自哨兵5P(Sentinel-5P)的总柱一氧化碳观测数据,采用离线处理流程生成。
预处理:针对原始S5P影像执行了以下操作:
1. 将影像重采样至0.1×0.1度的空间分辨率;
2. 将质量评估值≤0.5的像素设为缺失值(NA);
3. 按观测日期对影像进行聚合;
4. 将影像裁剪至纬度范围-60°至60°;
5. 将影像分割为尺寸128×128像素、时长16天的时空块;
6. 影像的采集时间区间为2021年1月1日至2021年11月25日。
请注意,训练块与验证块均从所有可用时空块中随机采样获取。
数据格式与命名规范:
输入与输出数据块以GeoTIFF(地理标记图像文件格式)文件存储,其中波段对应时间维度。需遵循以下文件命名规则:
- 以X开头的文件为训练用输入测量数据,已人工添加空白间隙;
- 以Y开头的文件为真实测量数据,未人工添加空白间隙(但仍存在天然空白);
- 以MASK开头的文件为输入数据的二值掩码,有效测量像素标记为1,其余像素标记为0;
- 以VALMASK开头的文件为二值掩码,仅标记Y中存在但X中缺失的像素为1,该掩码仅用于人工移除像素的验证任务。
文件名中的数字用于编码时空块的索引信息。
此外,数据集的`predictions`目录下存储了不同模型对验证块的预测结果,各子文件夹对应不同模型的输出:
- `mean`:逐块简单平均预测;
- `timeseries`:简单线性时间序列插值;
- `gapfill`:文献[1]提出的方法;
- `stmra`:文献[2]提出的方法;
- `STpconv`:基于三维部分卷积的人工神经网络预测结果。
参考文献:
[1] Gerber, F., de Jong, R., Schaepman, M. E., Schaepman-Strub, G., & Furrer, R. (2018). 时空遥感数据缺失值预测. 《IEEE地球科学与遥感汇刊》, 56(5), 2841-2853.
[2] Appel, M., & Pebesma, E. (2020). 面向全球环境数据分析的时空多分辨率近似方法. 《空间统计学》, 38, 100465.
创建时间:
2023-06-28



