AcceleratedKernels.jl Arithmetic and Sorting Benchmarks and HPC Logs

Name: AcceleratedKernels.jl Arithmetic and Sorting Benchmarks and HPC Logs
Creator: IEEE DataPort
Published: 2024-10-14 01:25:35
License: 暂无描述

DataCite Commons2024-10-14 更新2025-04-16 收录

下载链接：

https://ieee-dataport.org/documents/acceleratedkernelsjl-arithmetic-and-sorting-benchmarks-and-hpc-logs

下载链接

链接失效反馈

官方服务：

资源简介：

Benchmark code, HPC runtime logs, and analysis for the "AcceleratedKernels.jl: Cross-Architecture Parallel Algorithms from a Unified, Transpiled Codebase" Paper.  AcceleratedKernels.jl is a backend-agnostic library for parallel computing in Julia, natively targeting NVIDIA, AMD, Intel, and Apple accelerators via a unique transpilation architecture. Written in a unified, compact codebase, it enables productive parallel programming with minimised implementation and usage complexities. Benchmarks of arithmetic-heavy kernels show performance on par with C and OpenMP-multithreaded CPU implementations, with Julia sometimes offering more consistent and predictable numerical performance than conventional C compilers. Exceptional composability is highlighted as simultaneous CPU-GPU co-processing is achievable - such as CPU-GPU co-sorting - with transparent use of hardware-specialised MPI implementations. Tests on the Baskerville Tier 2 UK HPC cluster achieved world-class sorting throughputs of 538-855 GB/s using 200 NVIDIA A100 GPUs, comparable to the highest literature-reported figure of 900 GB/s achieved on 262,144 CPU cores. The use of direct NVLink GPU-to-GPU interconnects resulted in a 4.93x speedup on average; normalised by a combined capital, running and environmental cost, communication-heavy HPC tasks only become economically viable on GPUs if GPUDirect interconnects are employed.

提供机构：

IEEE DataPort

创建时间：

2024-10-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集