Acceleration of Semiempirical Electronic Structure Theory Calculations on Consumer-Grade GPUs Using Mixed-Precision Density Matrix Purification
收藏Figshare2025-07-15 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Acceleration_of_Semiempirical_Electronic_Structure_Theory_Calculations_on_Consumer-Grade_GPUs_Using_Mixed-Precision_Density_Matrix_Purification/29573011
下载链接
链接失效反馈官方服务:
资源简介:
For semiempirical electronic structure methods, solving the Roothaan–Hall equations to determine the one-electron density matrix is generally the computational bottleneck. Therefore, alternatives have been proposed to directly solve for the one-electron density matrix without the need to solve for the orbitals first. In this work, we implement an efficient dense linear algebra implementation of Niklasson’s density matrix purification schemes using graphics processing units (GPUs). The computational bottleneck in these methods is the matrix–matrix multiplication needed to construct the purification polynomials, which can be accelerated by using GPUs. Of particular interest in this work is the use of consumer-grade GPUs that thrive on algorithms that maximize the amount of single-precision (FP32) operations carried out. Therefore, we present a tailored mixed-precision (MP) scheme to leverage much of the FP32 performance of these GPUs without sacrificing numerical accuracy in the self-consistent field (SCF) calculations. We demonstrate that our MP implementation is faster than LAPACK (Intel oneMKL DSYGVD) and cuSOLVER DSYGVD diagonalization-based density matrix builds for molecules with more than 1000 basis functions in combination with the semiempirical GFN2-xTB method. At the same time, the numerical precision of the energies and gradients is not significantly impacted by the MP scheme compared to a full double-precision (FP64) treatment. This gives access to significant accelerations of semiempirical calculations on commodity computing hardware. Going further, we show that our asynchronous GPU implementation enables running multiple SCFs in parallel on a single GPU, which enables us to leverage our implementation for accelerating state-of-the-art conformational sampling procedures that are based on molecular dynamics and metadynamic simulations.
创建时间:
2025-07-15



