five

Execution time of double-precision and high-precision GEMM implementations on Intel Core i5-7500 and NVIDIA Turing RTX 2080

收藏
Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/5dgdc42x7p
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains the execution time for matrix-matrix multiplication kernels with general matrices (GEMM, BLAS Level 3) implemented using existing double-precision linear algebra software as well as multiple-precision libraries for CPU and GPU. The operation is C = α * op(A) * op(B) + β * C, where α and β are scalars, A, B, C are matrices, op(A) is an M-by-K matrix, op(B) is a K-by-N matrix, C is an M-by-N matrix, and op(X) is one of op(X) = X or op(X) = X^T. Each raw file provided contains the results of three test runs in milliseconds. The complete source code for the tests can be found at https://github.com/kisupov/mpres-blas. Common experiment settings: • Dense, random, 1000-by-1000 general matrices A, B and C; • Random scalars α and β; • Measurements are in milliseconds; • Arithmetic precision from 106 to 424 bits. Test cases considered: • Non transposed: op(A) = A, op(B) = B; • Transposed A: op(A) = A^T, op(B) = B; • Transposed B: op(A) = A, op(B) = B^T; • Transposed both A and B: op(A) = A^T, op(B) = B^T; Experimental environment: • Intel Core i5 7500 processor; • 32GB of DDR4 system memory; • NVIDIA Turing RTX 2080 GPU (2944 CUDA Cores, Compute Capability 7.5, 8GB of GDDR6 memory); • Ubuntu 20.04.5 LTS; • NVIDIA Driver V455.32.00; • CUDA Toolkit V11.1. The following GEMM implementations are evaluated: • OpenBLAS (OpenMP, 53 bits) – double-precision implementation for CPU using OpenBLAS (https://github.com/xianyi/OpenBLAS); • Custom double on CPU (OpenMP, 53 bits) – custom double-precision parallel (OpenMP) implementation; • MPFR (OpenMP) – multiple-precision parallel implementation using the GNU MPFR Library for CPU (https://www.mpfr.org/); • cuBLAS (53 bits) – double-precision implementation for CUDA using the NVIDIA Basic Linear Algebra Subroutines library (https://docs.nvidia.com/cuda/cublas/index.html); • Custom double on GPU (53 bits) – custom double-precision CUDA implementation; • MPRES-BLAS – multiple-precision CUDA implementation using MPRES-BLAS library (https://github.com/kisupov/mpres-blas); • CAMPARY – multiple-precision CUDA implementation using CAMPARY library (https://homepages.laas.fr/mmjoldes/campary/).
创建时间:
2022-12-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作