TTDFT: A GPU accelerated Tucker tensor DFT code for large-scale Kohn-Sham DFT calculations
收藏doi.org2025-03-25 收录
下载链接:
http://doi.org/10.17632/8dgmcs8ys2.1
下载链接
链接失效反馈官方服务:
资源简介:
We present the Tucker tensor DFT (TTDFT) code which uses a tensor-structured algorithm with graphic processing unit (GPU) acceleration for conducting ground-state DFT calculations on large-scale systems. The Tucker tensor DFT algorithm uses a localized Tucker tensor basis computed from an additive separable approximation to the Kohn-Sham Hamiltonian. The discrete Kohn-Sham problem is solved using Chebyshev filtered subspace iteration method that relies on matrix-matrix multiplications of a sparse symmetric Hamiltonian matrix and a dense wavefunction matrix, expressed in the localized Tucker tensor basis. These matrix-matrix multiplication operations, which constitute the most computationally intensive step of the solution procedure, are GPU accelerated providing ∼8-fold GPU-CPU speedup for these operations on the largest systems studied. The computational performance of the TTDFT code is presented using benchmark studies on aluminum nano-particles and silicon quantum dots with system sizes ranging up to ∼7,000 atoms.
本报告介绍了一种基于张量结构的算法,并利用图形处理单元(GPU)加速技术,以实现大规模系统上的基态密度泛函理论(DFT)计算。Tucker张量DFT算法通过从对Kohn-Sham哈密顿量的加性可分离近似中计算局部化的Tucker张量基,从而构建了一个局部化的Tucker张量基。离散Kohn-Sham问题通过Chebyshev滤波子空间迭代方法求解,该方法依赖于稀疏对称哈密顿矩阵与密集波函数矩阵之间的矩阵-矩阵乘法,这些矩阵在局部化的Tucker张量基下表示。这些矩阵-矩阵乘法操作构成了求解过程中的计算密集步骤,通过GPU加速实现了约8倍的GPU-CPU速度提升。在铝纳米颗粒和硅量子点系统上进行的基准研究展示了TTDFT代码的计算性能,系统规模可扩展至约7000个原子。
提供机构:
doi.org



