five

High-performance OpenCL-based GEMM Optimization

收藏
DataCite Commons2024-04-16 更新2025-04-16 收录
下载链接:
https://ieee-dataport.org/documents/high-performance-opencl-based-gemm-optimization
下载链接
链接失效反馈
官方服务:
资源简介:
    OpenCL has become the favored framework for emerging heterogeneous devices and FPGAs, owing to its versatility and portability.    However, OpenCL-based math libraries still face challenges in fully leveraging device performance.    When deploying high-performance arithmetic applications on these devices, the most important hot function is General Matrix-matrix Multiplication (GEMM).    This study presents a meticulously optimized OpenCL GEMM kernel.    Our enhanced GEMM kernel emphasizes two key improvements: 1) a three-level double buffer pipeline that efficiently overlaps data fetching with floating-point computations;     2) a fine-grained prefetching strategy of private memory to increase device occupancy by optimizing register unit utilization.    Furthermore, this work presents a Bayesian Optimization (BO) tuner for kernel auto-tuning.    Experimental results demonstrate considerable optimization improvement and performance advantages achieved on diverse OpenCL devices.    Additionally, the BO tuner demonstrates superior efficiency and robustness, outperforming contemporary tuning methods.
提供机构:
IEEE DataPort
创建时间:
2024-04-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作