five

Overview of procedures applied in machine learning models for long term prediction of streamflows

收藏
DataCite Commons2023-11-25 更新2024-07-13 收录
下载链接:
https://orkg.org/comparison/R653336/
下载链接
链接失效反馈
官方服务:
资源简介:
The comparison represents overview of procedures applied for long term prediction of streamflows (disharges) by machine learning models. The objective is to give broader representation of methods and procedures and their characteristics. The long term generally implies prediction for more than 14 days ahead. Beside machine learning models, It may also include statistical models comparable or coupled with machine learning models. Input data defines which type of data is used as input to the model (in major number of cases it is data from observation stations, but in some cases outputs from meteorological/climatological forecasts are used, but also some other types of data like data from satellite may be used). Number of instances represents the number of samples used in modeling procedure, while historical dataset length represents the length of observed historical data. Generally time step of one month is applied in represented contributions (papers). Time framework defines for how much ahead prediction models were applied. Models can be applied for hindcasting and forecasting. Forecasts can be applied for various time frameworks, from prediction for current month to predictions several months ahead. But, what is interesting, time framework has to be considered together with usage of exogenous, endogenous and simultaneously both types of input variables. Hence, if exclusively exogenous inputs (precipitation, temperature and other climatological dana) were used for modeling, those models can be further applied for long term prediction based on outputs from climatological models (which are then used as inputs – precipitation, temperature, etc.). With usage of exclusively endogenous input variables (prediction of streamflows based on preceding streamflows as inputs), it is not possible to further apply models for prediction based on climatological models. It is neither possible with simultaneous usage of exogenous and endogenous variables. Therefore, models developed based on exogenous variables, if they are used for current month prediction (in developing process they generally should be), that means that they can be applied for longer time framework in further development, based on downscaling from climatological models, that is, climatological projections. Properties related to datast split define how much splits (of all samples) were used and in what ratios. Endogenous predictors defines wether dependent variables (output) were predicted based on the variable itself as input (streamflows predicted from preceding streamflows), while exogenous predictors defines wether dependent variable (output) is predicted based on input other than itself (precipitation, temperature, etc.) or eventually from other observation points (input and output data are on different stations). If both types of inputs were simultaneously used than it is defined by property 'simultaneous usage of endogenous and exogenous predictors'. Data scaling is the method applied for scaling of input and output data before application of model training, and is usually applied in machine learning in order to improve model performance. Data transformation defines if some specific method was used to transform data before model training, such as wavelet decomposition, intrinsic mode decomposition and similar. Feature selection defines wether preceding or successive preceding set of input values were manually chosen for prediction, or features (inputs) were selected by some specific method (able to separate more important features from less important features). Property 'multioutput or chain procedure' defines wether a model was trained with a single output or it was trained with multiple outputs (prediction of output as vector of with multiple values, or prediction of output by chain procedure in which further predictions (e.g. value at t+2, t+3, etc.) are made based on all features and available earlier predictions in the chain (features + value at t+1 for value t+2, features + values at t'1 and t+2 for value at t+3, etc.)). Models (estimators) defines which machine learning (and in some cases statistical learning) models were applied in the contribution. Property ensemble technique defines wether some kind of ensemble model (combination of two or more models in a single model in order to improve performance) was applied in procedure. Two properties related to the used performance metrics define wether at least one of two absolute metrics (mean absolute error and root mean square error) were used in the contribution, and wether at least one of two relative (precision) metrics (coefficient of determination and Nash-Sutcliffe efficiency coefficient) were used in the contribution. Those specific metrics are chosen due to their importance, wide and frequent usage and simplicity in description of model behaviour, as also the next three properties related to the visualization of model performance. Comparison of model and observations is simply the comparison of observed values and predicted values at the same moment in time. Model vs. observation is scatter plot of observed values and predicted values (usually observed values at x-axis and predicted values at y-axis). As models generally tend to underestimate peak flows (and sometimes overestimate low flows), property peak flows defines wether model performance is checked on some specific peak flows. It is specifically important to check how model performs in the area of extreme and less frequently recorded values. As there does not exist perfect model, estimation of uncertainties in prediction can provide more substantive information about model behaviour and improve forecast (!) of streamflow (discharge).
提供机构:
Open Research Knowledge Graph
创建时间:
2023-11-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作