Execution time of double-precision and high-precision GEMM implementations on Intel Core i5-7500 and NVIDIA Turing RTX 2080
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/5dgdc42x7p
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains the execution time for matrix-matrix multiplication kernels with general matrices (GEMM, BLAS Level 3) implemented using existing double-precision linear algebra software as well as multiple-precision libraries for CPU and GPU. The operation is C = α * op(A) * op(B) + β * C, where α and β are scalars, A, B, C are matrices, op(A) is an M-by-K matrix, op(B) is a K-by-N matrix, C is an M-by-N matrix, and op(X) is one of op(X) = X or op(X) = X^T. Each raw file provided contains the results of three test runs in milliseconds. The complete source code for the tests can be found at https://github.com/kisupov/mpres-blas.
Common experiment settings:
• Dense, random, 1000-by-1000 general matrices A, B and C;
• Random scalars α and β;
• Measurements are in milliseconds;
• Arithmetic precision from 106 to 424 bits.
Test cases considered:
• Non transposed: op(A) = A, op(B) = B;
• Transposed A: op(A) = A^T, op(B) = B;
• Transposed B: op(A) = A, op(B) = B^T;
• Transposed both A and B: op(A) = A^T, op(B) = B^T;
Experimental environment:
• Intel Core i5 7500 processor;
• 32GB of DDR4 system memory;
• NVIDIA Turing RTX 2080 GPU (2944 CUDA Cores, Compute Capability 7.5, 8GB of GDDR6 memory);
• Ubuntu 20.04.5 LTS;
• NVIDIA Driver V455.32.00;
• CUDA Toolkit V11.1.
The following GEMM implementations are evaluated:
• OpenBLAS (OpenMP, 53 bits) – double-precision implementation for CPU using OpenBLAS (https://github.com/xianyi/OpenBLAS);
• Custom double on CPU (OpenMP, 53 bits) – custom double-precision parallel (OpenMP) implementation;
• MPFR (OpenMP) – multiple-precision parallel implementation using the GNU MPFR Library for CPU (https://www.mpfr.org/);
• cuBLAS (53 bits) – double-precision implementation for CUDA using the NVIDIA Basic Linear Algebra Subroutines library (https://docs.nvidia.com/cuda/cublas/index.html);
• Custom double on GPU (53 bits) – custom double-precision CUDA implementation;
• MPRES-BLAS – multiple-precision CUDA implementation using MPRES-BLAS library (https://github.com/kisupov/mpres-blas);
• CAMPARY – multiple-precision CUDA implementation using CAMPARY library (https://homepages.laas.fr/mmjoldes/campary/).
创建时间:
2022-12-19



