Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Advanced_Techniques_for_High-Performance_Fock_Matrix_Construction_on_GPU_Clusters/27905300
下载链接
链接失效反馈官方服务:
资源简介:
This Article presents two optimized multi-GPU algorithms
for Fock
matrix construction, building on the work of Ufimtsev and Martinez
[J. Chem. Theory Comput. 2009, 5, 1004–1015] and Barca et al. [J. Chem. Theory Comput. 2021, 17, 7486–7503]. The novel algorithms, opt-UM and opt-Brc,
introduce significant enhancements, including improved integral screening,
exploitation of sparsity and symmetry, a linear scaling exchange matrix
assembly algorithm, and extended capabilities for Hartree–Fock
caculations up to f-type angular momentum functions.
Opt-Brc excels for smaller systems and for highly contracted triple-ζ
basis sets, while opt-UM is advantageous for large molecular systems.
Performance benchmarks on NVIDIA A100 GPUs show that our algorithms
in the EXtreme-scale Electronic Structure System (EXESS), when combined, outperform all current GPU and CPU Fock build implementations
in TeraChem, QUICK, GPU4PySCF, LibIntX, ORCA, and Q-Chem. The implementations
were benchmarked on linear and globular systems and average speed
ups across three double-ζ basis sets of 1.4×, 8.4×,
and 9.4× were observed compared to TeraChem, QUICK, and GPU4PySCF respectively. An increased average speedup of 2.1× over TeraChem is observed when using four A100 GPUs. Strong
scaling analysis reveals over 91% parallel efficiency on four GPUs
for opt-Brc, making it typically faster for multi-GPU execution. Single-compute-node
comparisons with CPU-based software like ORCA and Q-Chem show speedups of up to 42×
and 31×, respectively, enhancing power efficiency by up to 18×.
创建时间:
2024-11-25



