2m Temperature Forecast by Deep Learning
收藏DataCite Commons2022-12-16 更新2024-07-13 收录
下载链接:
https://data.fz-juelich.de/citation?persistentId=doi:10.26165/JUELICH-DATA/X5HPXP
下载链接
链接失效反馈官方服务:
资源简介:
This repository provides the preprocessed datasets, which are used in the study Temperature forecasting by deep learning methods by Gong et al. (2022). This allows the user to reproduce the presented results without running the preprocessing chain from the raw ERA5 data. Data description The datasets used to train, validate, and test the deep neural networks are based on the ERA5 reanalysis data provided by the European Centre for Medium-range Weather Forecast (ECMWF). Five different datasets have been created. All incorporate data between the years 2007 and 2019, but cover slightly varying domains over Central Europe and include different meteorological variables. The datasets are made available in compressed tar-archives (see Storage Location URL below). The file names thereby encapsulate some meta-information using the following naming convention: ERA5-Y[yyyy]-[yyyy]M[mm]to[mm]-[nx]x[ny]-[nn.nn]N[ee.ee]E-[var1]_[var2]_[var3] where - Y[yyyy]-[yyyy]M[mm]to[mm] denotes the years and the months describing the data period, -[nx]x[ny] is the number of grid points/pixels of the target domain in longitude and latitude direction, -[nn.nn]N[ee.ee]E stands for the geographical coordinates in degree of the target domain's south-west corner and -[var1]_[var2]_[var3] denote the short names of the variables according to ECMWF's parameter database In particular, the following datasets are provided: 1) era5-Y2007-2019M01to12-92x56-3840N0000E-2t_tcc_t850.tar.bz2: The target domain extends from 38.4°N to 54.9°N and 0.0°E to 27.3°E (92x56 grid points). The 2m-temperature (2t), the total cloud cover (tcc), and the 850 hPa temperature (t_850) are included as variables. This data corresponds to Datasets ID 1-3 in table A1 of the manuscript. 2) era5-Y2007-2019M01to12-80x48-3960N0180E-2t_tcc_t850.tar.bz2: The target domain extends from 39.6°N to 53.7°N and 1.8°E to 25.5°E (80x48 grid points). The 2t, tcc, and the t_850 are included as variables. This data corresponds to Dataset ID 4 in table A1 of the manuscript. 3) era5-Y2007-2019M01to12-72x44-4020N0300E-2t_tcc_t_850.tar.bz2: The target domain extends from 40.2°N to 53.1°N and 3.0°E to 24.3°E (72x44 grid points). The 2t, tcc, and t_850 are included as variables. This data corresponds to Dataset ID 5 in table A1 of the manuscript. 4) era5-Y2007-2019M01to12-80x48-3960N0180E-2t_t850.tar.bz2: The target domain extends from 39.6°N to 53.7°N and 1.8°E to 25.5°E (80x48 grid points). The 2t and the t_850 are the only variables included. This data set is actually a subset of No. 2. This data corresponds to Dataset ID 6 in table A1 of the manuscript. 5) era5-Y2007-2019M01to12-80x48-3960N0180E-2t.tar.bz2: The target domain extends from 39.6°N to 53.7°N and 1.8°E to 25.5°E (80x48 grid points). 2t is exclusively included. This data set is also a subset of No. 2. This data corresponds to Dataset ID 7 in table A1 of the manuscript. Data creation The original ERA5 data can be retrieved from the (MARS archive). Once access is granted, data can be downloaded by specifying a resolution of 0.3° in the retrieval script. The datasets provided in this repository are the processed ERA5 data after the extraction and the two preprocessing steps using the Atmospheric Machine learning Benchmarking System (AMBS) workflow tool (more details are provided in the README of the corresponding code repository). The data is available in TFRecords format that is used directly in the training step. Data access and decompression Data are stored in the archived and compressed format tar.bz2 and available via: https://datapub.fz-juelich.de/esde/esde-nfs/online_publication/2mT_by_DL/. After downloading, the compressed archives can be unpacked on Linux using tar xjf [filename].tar.bz2. On Windows, decompressing can be performed using WinZip. Dataset content After decompressing, the following subdirectory structure is created from each compressed tar-archive: - tfrecords_seq_len_[sequence_length]: This folder holds the TFRecords files that are streamed to the deep neural networks during training and postprocessing. Each TFRecord file contains 10 samples, where each sample comprises a sequence over [sequence_length] hours. - pickle: This folder contains the normalized hourly data saved in monthly pickle files (X_[month].pkl). The corresponding timestamps are included in T_[month].pkl. Furthermore, statistical information for each month is provided in the files stat_[month].json. - metadata.json: This file provides important meta information including the coordinates of the target domain, the included variables (e.g. 2t and t_850) and the origin of the processed data. - statsitic.json: This file includes the statistical information (maximum, minimum, and average values) used for normalizing the data. It also includes other information such as the total number of the timestamps (nfiles) and the list of JSON files (stat_[month].json) to compute the statistics. Data integrity and verification The tar-archives have been recursively checksummed with the md5 hash function. The generated file is uploaded to ensure the integrity of the files and no alteration to the dataset. To verify the integrity of the downloaded data, use the following snippet: find -type f -exec md5sum '{}' \; > md5sum.txt It will generate a single text file that should be identical to the file in this entry. License Original data by ECMWF Copyright "© 2022 European Centre for Medium-Range Weather Forecasts (ECMWF)". Source www.ecmwf.int. This data is published under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. https://creativecommons.org/licenses/by/4.0/ Contact Bing Gong (b.gong@fz-juelich.de)
提供机构:
Jülich DATA
创建时间:
2022-07-26



