Overview of procedures applied in machine learning models for sea levels prediction
收藏DataCite Commons2023-11-06 更新2024-07-13 收录
下载链接:
https://orkg.org/comparison/R649030/
下载链接
链接失效反馈官方服务:
资源简介:
The comparison represents overview of procedures applied for prediction of sea levels by machine learning models. The objective is to give broader represantation of methods and procedures and its characteristics. The overview include generally all time frameworks of sea levels prediction by machine learning models. It may also include statistical models comparable or coupled with machine learning models. Input data defines which type of data is used as input to the model (In major number of cases it is data from observation stations, but in some cases outputs from meteorological forecasts are used, but also some other types of data like data from satellite may be used). Number of instances represents the number of samples used in modeling procedure, while historical dataset length represents the length of observed historical data. Real-time prediction implies prediction (forecast) for next several hours (5 or 6 hours), short term implies prediction for next 3 days (from 5-6 hours to 72 hours), mid term for next 14 days (from 72 to 336 hours), and long term implies prediction for more than 14 days ahead. For represented comparison, the mentioned classification is proposed by the author, but generally can vary among researchers and practitioners. Diiferent time steps are applied in represented contributions (papers), usually 1 hour, but based on data on disposition and research goals, in some cases 1 day, and in some less than 1 hour. Properties related to datast split define how much splits (of all samples) were used and in what ratios. Endogenous predictors defines wether sea levels (output) were predicted based on sea levels as input, while exogenous predictors defines wether sea levels (output) were predicted based on input other than sea levels (air pressure, humidity, wind, temperature, moon phases, etc.) or eventually sea levels from other observation points (input and output data are on different stations). If both types of inputs were simultaneously used than it is defined by property 'simultaneous usage of endogenous and exogenous predictors'. Data scaling is the method applied for scaling of input and output data before application of model training, and is usually applied in machine learning in order to improve model performance. Data transformation defines if some specific method was used to transform data before model training, such as wavelet decomposition, intrinsic mode decomposition and similar. Feature selection defines wether preceding or successive preceding set of input values were manually chosen for prediction, or features (inputs) were selected by some specific method (able to separate more important features from less important features). Property 'multioutput or chain procedure' defines wether a model was trained with a single output or it was trained with multiple outputs (prediction of output as vector of with multiple values, or prediction of output by chain procedure in which further predictions (e.g. value at t+2, t+3, etc.) are made based on all features and available earlier predictions in the chain (features + value at t+1 for value t+2, features + values at t'1 and t+2 for value at t+3, etc.)). Models (estimators) defines which machine learning (and in some cases statistical learning) models were applied in the contribution. Two properties related to the used performance metrics define wether at least one of two absolute metrics (mean absolute error and root mean square error) were used in the contribution, and wether at least one of two relative (precision) metrics (coefficient of determination and Nash-Sutcliffe efficiency) were used in the contribution. Those specific metrics are chosen due to their importance, wide and frequent usage and simplicity in description of model behaviour, as also the next three properties related to the visualization of model performance. Comparison of model and observations is simply the comparison of observed values and predicted values at the same moment in time. Model vs. observation is scatter plot of observed values and predicted values (usually observed values at x-axis and predicted values at y-axis). As prediction of sea levels is of great significance for prediction of potential flood, high tide events defines wether model performance is checked on some specific high tide events. Not only from practical point of view for prediction of floods, but also it is specifically important to check how model performs in the area of extreme and less frequently recorded values. As there does not exist perfect model, estimation of uncertainties in prediction can provide more substantive information about model behaviour and improve forecast (!) of sea levels.
提供机构:
Open Research Knowledge Graph
创建时间:
2023-11-06



