Predicting Seabed Mud Content across the Australian Margin: Comparison of Statistical and Mathematical Techniques Using a Simulation Experiment
收藏Research Data Australia2024-12-14 收录
下载链接:
https://researchdata.edu.au/predicting-seabed-mud-simulation-experiment/683368
下载链接
链接失效反馈官方服务:
资源简介:
In this study, we conducted a simulation experiment to identify robust spatial interpolation methods using samples of seabed mud content in the Geoscience Australian Marine Samples database. Due to data noise associated with the samples, criteria are developed and applied for data quality control. Five factors that affect the accuracy of spatial interpolation were considered:
1) regions;
2) statistical methods;
3) sample densities;
4) searching neighbourhoods; and
5) sample stratification.
Bathymetry, distance-to-coast and slope were used as secondary variables. Ten-fold cross-validation was used to assess the prediction accuracy measured using mean absolute error, root mean square error, relative mean absolute error (RMAE) and relative root mean square error. The effects of these factors on the prediction accuracy were analysed using generalised linear models. The prediction accuracy depends on the methods, sample density, sample stratification, search window size, data variation and the study region. No single method performed always superior in all scenarios. Three sub-methods were more accurate than the control (inverse distance squared) in the north and northeast regions respectively; and 12 sub-methods in the southwest region. A combined method, random forest and ordinary kriging (RKrf), is the most robust method based on the accuracy and the visual examination of prediction maps. This method is novel, with a relative mean absolute error (RMAE) up to 17% less than that of the control. The RMAE of the best method is 15% lower in two regions and 30% lower in the remaining region than that of the best methods in the previously published studies, further highlighting the robustness of the methods developed. The outcomes of this study can be applied to the modelling of a wide range of physical properties for improved marine biodiversity prediction. The limitations of this study are discussed. A number of suggestions are provided for further studies.
本研究依托澳大利亚地球科学海洋样本数据库(Geoscience Australian Marine Samples database)中的海底泥质含量样本,开展模拟实验以筛选鲁棒性较强的空间插值方法(spatial interpolation methods)。鉴于样本存在数据噪声(data noise)问题,本研究制定并实施了数据质量控制(data quality control)标准。本次实验考量了五类影响空间插值精度的因素:1)研究区域;2)统计方法;3)样本密度;4)搜索邻域;5)样本分层。以水深、离岸距离及坡度作为辅助变量。采用十倍交叉验证(Ten-fold cross-validation)法评估预测精度,评估指标包括平均绝对误差(mean absolute error)、均方根误差(root mean square error)、相对平均绝对误差(relative mean absolute error, RMAE)及相对均方根误差(relative root mean square error)。借助广义线性模型(generalised linear models)分析上述因素对预测精度的影响。研究结果表明,预测精度受插值方法、样本密度、样本分层、搜索窗口尺寸、数据变异程度及研究区域的共同影响。不存在在所有场景下均表现最优的插值方法。在北部及东北部区域,分别有3种子方法的精度优于对照方法(反距离平方插值法,inverse distance squared);在西南部区域则有12种子方法精度更优。一种结合随机森林(random forest)与普通克里金法(ordinary kriging)的组合方法RKrf,基于预测精度与预测图谱的可视化校验结果,被认定为鲁棒性最优的空间插值方法。该方法具备创新性,其相对平均绝对误差(RMAE)较对照方法最高可降低17%。相较于已有公开研究中的最优方法,本研究得到的最优方法在两个区域的RMAE降低了15%,在其余区域则降低了30%,进一步彰显了本研究所提出方法的鲁棒性。本研究成果可应用于多类海洋物理属性的建模工作,以提升海洋生物多样性(marine biodiversity)预测精度。本研究还讨论了存在的局限性,并为后续研究提出了若干建议。
提供机构:
Australian Ocean Data Network



