Gaussian Process kernels comparison - Datasets and python code

Name: Gaussian Process kernels comparison - Datasets and python code
Creator: figshare.unimelb.edu.au
Published: 2024-06-24 00:00:00
License: 暂无描述

figshare.unimelb.edu.au2024-06-24 更新2025-03-25 收录

下载链接：

https://figshare.unimelb.edu.au/articles/dataset/Gaussian_Process_kernels_comparison_-_Datasets_and_python_code/26087719/1

下载链接

链接失效反馈

官方服务：

资源简介：

OverviewData used for publication in "Comparing Gaussian Process Kernels Used in LSG Models for Flood Inundation Predictions". We investigate the impact of 13 Gaussian Process (GP) kernels, consisting of five single kernels and eight composite kernels, on the prediction accuracy and computational efficiency of the Low-fidelity, Spatial analysis, and Gaussian process learning (LSG) modelling approach. The GP kernels are compared for three distinct case studies namely Carlisle (United Kingdom), Chowilla floodplain (Australia), and Burnett River (Australia). The high- and low-fidelity model simulation results are obtained from the data repository Fraehr, N. (2024, January 19). Surrogate flood model comparison - Datasets and python code (Version 1). The University of Melbourne. https://doi.org/10.26188/24312658.v1.Dataset structureThe dataset is structured in 5 file folders:CarlisleChowillaBurnettRVComparison_resultsPython_dataThe first three folders contain simulation data and analysis codes. The "Comparison_results" folder contains plotting codes, figures and tables for comparison results. The "Python_data" folder contains LSG model functions and Python environment requirement.Carlisle, Chowilla, and BurnettRVThese files contain high- and low-fidelity hydrodynamic modelling data for training and validation for each individual case study, as well as specific Python scripts for training and running the LSG model with different GP kernels in each case study. There are only small differences between each folder, depending on the hydrodynamic model simulation results and EOF analysis results.Each case study file has the following folders:Geometry_dataDEM files.npz files containing of the high-fidelity models grid (XYZ-coordinates) and areas (Same data is available for the low-fidelity model used in the LSG model).shp files indicating location of boundaries and main flow pathsXXX_modeldataFolder to storage trained model data for each XXX kernel LSG model. For example, EXP_modeldata contains files used to store the trainined LSG model using exponential Gaussian Process kernel.ME3LIN means ME3 + LIN. ME3mLIN means ME3 x LIN.EXPLow mean inducing points percentage for Sparse GP is 5%.EXPMid mean inducing points percentage for Sparse GP is 15%.EXPHigh mean inducing points percentage for Sparse GP is 35%.EXPFULL mean inducing points percentage for Sparse GP is 100%.HD_model_dataHigh-fidelity simulation results for all flood events of that case studyLow-fidelity simulation results for all flood events of that case studyAll boundary input conditionsHF_EOF_analysisStoring of data used in the EOF analysis for the LSG model.Results_dataStoring results of running the evaluation of the LSG models with different GP kernel candidates.Train_test_split_dataThe train-test-validation data split is the same for all LSG models with different GP kernel candidates. The specific split for each cross-validation fold is stored in this folder.YYY_event_summary.csv, YYY_Extrap_event_summary.csvFiles containing overview of all events, and which events are connected between the low- and high-fidelity models for each YYY case study.EOF_analysis_HFdata_preprocessing.py, EOF_analysis_HFdata.pyPreprocessing before EOF analysis and the EOF analysis of the high-fidelity data.Evaluation.py, Evaluation_extrap.pyScripts for evaluating the LSG model for that case study and saving the results for each cross-validation fold.train_test_split.pyScript for splitting the flood datasets for each cross-validation fold, so all LSG models with different GP kernel candidates train on the same data.XXX_training.pyScript for training each LSG model using the XXX GP kernel.ME3LIN means ME3 + LIN. ME3mLIN means ME3 x LIN.EXPLow mean inducing points percentage for Sparse GP is 5%.EXPMid mean inducing points percentage for Sparse GP is 15%.EXPHigh mean inducing points percentage for Sparse GP is 35%.EXPFULL mean inducing points percentage for Sparse GP is 100%.XXX_training.batBatch scripts for training all LSG models using different GP kernel candidates.Comparison_resultsFiles used for comparing LSG models using different GP kernel candidates and generate the figures in the paper "Comparing Gaussian Process Kernels Used in LSG Models for Flood Inundation Predictions". Figures are also included.Python_dataFolder containing Python script with utility functions for setting up, training, and running the LSG models, as well as for evaluating the LSG models. Python environmentThis folder also contains two python environment file with all Python package versions and dependencies. You can install CPU version or GPU version of environment. GPU version environment can use GPU to speed up the GPflow training process. It will install cuda and CUDnn package.You can choose to install environment online or offline. Offline installation reduces dependency issues, but it requires that you also use the same Windows 10 operating system as I do.Online installationLSG_CPU_environment.yml: python environment for running LSG models using CPU of the computerLSG_GPU_environment.yml: python environment for running LSG models using GPU of the computer, mainly using GPU to speed up the GPflow training process. It need to install cuda and CUDnn package.In the directory where the .yml file is located, use the console to enter the following commandconda env create -f LSG_CPU_environment.yml -n myenv_nameorconda env create -f LSG_GPU_environment.yml -n myenv_nameOffline installationIf you also use Windows 10 system as I do, you can directly unzip environment packed by conda-pack.LSG_CPU.tar.gz: Zip file containing all packages in the virtual environment for CPU onlyLSG_GPU.tar.gz: Zip file containing all packages in the virtual environment for GPU accelerationIn Windows system, create a new LSG_CPU or LSG_GPU folder in the Anaconda environment folder and extract the packaged LSG_CPU.tar.gz or LSG_GPU.tar.gz file into that folder.tar -xzvf LSG_CPU.tar.gz -C ./LSG_CPUortar -xzvf LSG_GPU.tar.gz -C ./LSG_GPUAccess to the environment pathcd ./LSG_GPUactivation environment.\Scripts\activate.batRemove prefixes from the activation environment.\Scripts\conda-unpack.exeExit environment.\Scripts\deactivate.batLSG_mods_and_funcPython scripts for using the LSG model.Evaluation_metrics.pyMetrics used to evaluate the prediction accuracy and computational efficiency of the LSG models.

概述本数据集用于《比较LSG模型中用于洪水淹没预测的高斯过程核》一文的发表。本研究旨在探讨13种高斯过程（GP）核，包括五种单一核和八种复合核，对低保真度、空间分析和高斯过程学习（LSG）建模方法预测精度与计算效率的影响。所涉及的GP核在三个不同的案例研究中进行比较，即英国卡莱尔、澳大利亚乔伊拉平原和澳大利亚伯内特河。高保真度与低保真度模型模拟结果源自数据仓库Fraehr, N.（2024年1月19日）发表的《替代洪水模型比较 - 数据集和Python代码》（版本1）。墨尔本大学。https://doi.org/10.26188/24312658.v1。数据集结构数据集分为5个文件文件夹：Carlisle、Chowilla、BurnettRV、Comparison_results、Python_data。前三个文件夹包含模拟数据和分析代码，'Comparison_results'文件夹包含比较结果的绘图代码、图表和表格，而'Python_data'文件夹则包含LSG模型函数和Python环境需求。卡莱尔、乔伊拉和BurnettRV这些文件包含每个独立案例研究的高保真度和低保真度水动力学建模数据，以及针对每个案例研究使用不同GP核训练和运行LSG模型的特定Python脚本。每个文件夹之间仅存在细微差异，这取决于水动力学模型模拟结果和EOF分析结果。每个案例研究文件包含以下文件夹：Geometry_data包含高保真度模型网格（XYZ坐标）和面积（低保真度模型中使用的相同数据）的DEM文件.npz文件以及指示边界和主要流路径的.shp文件XXX_modeldata存储每个XXX核LSG模型训练的模型数据。例如，EXP_modeldata包含用于存储使用指数高斯过程核训练的LSG模型的文件。ME3LIN表示ME3 + LIN，ME3mLIN表示ME3 x LIN。EXPLow表示稀疏GP的诱导点百分比为5%，EXPMid表示诱导点百分比为15%，EXPHigh表示诱导点百分比为35%，EXPFULL表示诱导点百分比为100%。HD_model_data包含该案例研究所有洪水事件的保真度模拟结果。Low-fidelity simulation results for all flood events of that case study。All boundary input conditions包含所有边界输入条件。HF_EOF_analysis存储LSG模型EOF分析所需的数据。Results_data存储运行不同GP核候选LSG模型评估的结果。Train_test_split_data包含所有LSG模型与不同GP核候选的train-test-validation数据拆分，每个交叉验证折的具体拆分存储在此文件夹中。YYY_event_summary.csv, YYY_Extrap_event_summary.csv包含所有事件的概述，以及每个YYY案例研究中低保真度模型与高保真度模型之间连接的事件。EOF_analysis_HFdata_preprocessing.py, EOF_analysis_HFdata.py包含EOF分析前的预处理和高保真数据的EOF分析。Evaluation.py, Evaluation_extrap.py包含评估该案例研究LSG模型并保存每个交叉验证折结果的脚本。train_test_split.py脚本用于为每个交叉验证折分割洪水数据集，以便所有LSG模型与不同GP核候选在相同数据上训练。XXX_training.py脚本用于使用XXX GP核训练每个LSG模型。ME3LIN表示ME3 + LIN，ME3mLIN表示ME3 x LIN。EXPLow表示稀疏GP的诱导点百分比为5%，EXPMid表示诱导点百分比为15%，EXPHigh表示诱导点百分比为35%，EXPFULL表示诱导点百分比为100%。XXX_training.bat包含用于使用不同GP核候选训练所有LSG模型的批处理脚本。Comparison_results包含用于比较使用不同GP核候选的LSG模型并生成论文“比较LSG模型中用于洪水淹没预测的高斯过程核”中的图表的文件。还包括图表。Python_data包含包含设置、训练和运行LSG模型以及评估LSG模型的实用函数的Python脚本。Python环境此文件夹还包含两个Python环境文件，其中包含所有Python包的版本和依赖项。您可以选择在线或离线安装环境。离线安装减少依赖问题，但要求您也使用与我相同的Windows 10操作系统。在线安装LSG_CPU_environment.yml：使用计算机CPU运行LSG模型的Python环境LSG_GPU_environment.yml：使用GPU运行LSG模型的Python环境，主要使用GPU加速GPflow训练过程。需要安装cuda和CUDnn包。在.yml文件所在的目录中，使用控制台输入以下命令conda env create -f LSG_CPU_environment.yml -n myenv_name或conda env create -f LSG_GPU_environment.yml -n myenv_name离线安装如果您也使用与我相同的Windows 10系统，可以直接解压由conda-pack打包的环境。LSG_CPU.tar.gz：仅包含CPU虚拟环境中所有包的zip文件LSG_GPU.tar.gz：包含用于GPU加速的虚拟环境中所有包的zip文件在Windows系统中，在Anaconda环境文件夹中创建一个新的LSG_CPU或LSG_GPU文件夹，并将打包的LSG_CPU.tar.gz或LSG_GPU.tar.gz文件提取到该文件夹中.tar -xzvf LSG_CPU.tar.gz -C ./LSG_CPU或tar -xzvf LSG_GPU.tar.gz -C ./LSG_GPU访问环境路径cd ./LSG_GPU激活环境.Scriptsactivate.bat从激活环境中移除前缀.Scriptsconda-unpack.exe退出环境.Scriptsdeactivate.batLSG_mods_and_func包含用于使用LSG模型的Python脚本。Evaluation_metrics.py包含用于评估LSG模型预测精度和计算效率的指标。

提供机构：

figshare.unimelb.edu.au

5,000+

优质数据集

54 个

任务类型

进入经典数据集