five

Adaptive pruning for increased robustness and reduced computational overhead in Gaussian process accelerated saddle point searches

收藏
DataCite Commons2026-03-12 更新2026-05-04 收录
下载链接:
https://archive.materialscloud.org/doi/10.24435/materialscloud:3c-2f
下载链接
链接失效反馈
官方服务:
资源简介:
Gaussian process (GP) regression provides a strategy for accelerating saddle point searches on high-dimensional energy surfaces by reducing the number of times the energy and its derivatives with respect to atomic coordinates need to be evaluated. The computational overhead in the hyperparameter optimization can, however, be large and make the approach inefficient. Failures can also occur if the search ventures too far into regions that are not represented well enough by the GP model. Here, these challenges are resolved by using geometry-aware optimal transport measures and an active pruning strategy using a summation over Wasserstein-1 distances for each atom-type in farthest-point sampling, selecting a fixed-size subset of geometrically diverse configurations to avoid rapidly increasing cost of GP updates as more observations are made. Stability is enhanced by permutation-invariant metric that provides a reliable trust radius for early-stopping and a logarithmic barrier penalty for the growth of the signal variance. These physically motivated algorithmic changes prove their efficacy by reducing to less than a half the mean computational time on a set of 238 challenging configurations from a previously published data set of chemical reactions. With these improvements, the GP approach is established as a robust and scalable algorithm for accelerating saddle point searches when the evaluation of the energy and atomic forces requires significant computational effort.   This record contains the complete traces of dimer saddle search runs with the OT-GP (optimal transport GP) framework. This includes STDOUT and HDF5 trajectories. The record is a companion to the code in the associated GitHub repository and can be used to regenerate the figures and validate the analysis in the accompanying manuscript.

高斯过程(Gaussian Process, GP)回归提供了一种可加速高维能量面上鞍点搜索的策略,其通过减少能量以及能量对原子坐标的导数的评估次数来实现该目标。然而,超参数优化过程中的计算开销往往过大,会导致该方法的效率低下。当搜索深入到GP模型未充分表征的区域时,也可能出现搜索失败的情况。 本文通过采用几何感知最优传输(Optimal Transport, OT)度量,以及一种针对每种原子类型在最远点采样中计算瓦瑟斯坦-1(Wasserstein-1)距离之和的主动剪枝策略,选取固定大小的几何多样化构型子集,以避免随着观测样本增多,GP更新的计算成本快速攀升,从而解决了上述挑战。通过采用置换不变度量,我们提升了算法稳定性:该度量可为提前终止提供可靠的信任域半径,并为信号方差的增长设置对数障碍惩罚项。这些基于物理动机的算法改进,在取自已发表化学反应数据集的238个挑战性构型上,将平均计算时间缩减至原有的一半以下,验证了其有效性。通过上述改进,当能量与原子力的评估需要大量计算资源时,GP方法已成为一种鲁棒且可扩展的鞍点搜索加速算法。 本数据集包含基于最优传输高斯过程(OT-GP, optimal transport GP)框架的二聚体鞍点搜索的完整运行轨迹。该轨迹包含标准输出(STDOUT)与HDF5格式的轨迹文件。本数据集与关联GitHub仓库中的代码配套,可用于复现随刊手稿中的图表并验证其分析结果。
提供机构:
Materials Cloud
创建时间:
2025-11-18
二维码
社区交流群
二维码
科研交流群
商业服务