Benchmark results for the ndzip-gpu floating point compressor
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4892883
下载链接
链接失效反馈官方服务:
资源简介:
These are tabular benchmark results collected for the ndzip-gpu floating point compressor (GitHub, DOI) as well as a range of other general-purpose and floating-point compressors on both CPU and GPU.
The following algorithms were examined:
ndzip-gpu, current version on dataset submission to Zenodo
ndzip (SIMD CPU implementation)
MPC 1.2
GFC 2.2
cudppCompress from CUDPP 2.3
ZFP 0.5.5
fpzip 1.3.0
LZMA (liblzma 5.2.5)
NVCOMP 2.0 schemes LZ4 and Cascaded
Compressor and decompressor performance was evaluated on the following systems:
One node of the Marconi-100 supercomputer, featuring dual POWER9 AC922 CPUs with 256 GB RAM and four Nvidia Tesla V100 Volta HPC GPUs (Compute Capability 7.0).
One AMD Ryzen 9 3900X desktop system with 64~GB RAM and one Nvidia RTX 2070 SUPER mid-range Turing consumer GPU (Compute Capability 7.5)
One Nvidia DGX A100 node featuring dual AMD EPYC 7742 CPUs with 1 TB RAM and eight Nvidia A100 40GB Ampere HPC GPUs (Compute Capability 8.0)
One dual-socket AMD EPYC 7282 node with 256 GB RAM and four Nvidia RTX 3090 high-end Ampere consumer GPUs (Compute Capability 8.6)
Software and compilers used for evaluation:
Clang 10.0
CUDA 11.0 to 11.3
Linux
Test datasets are described in the whitepaper
Fabian Knorr, Peter Thoman, and Thomas Fahringer: "Datasets for benchmarking floating-point compressors", arXiv.org, 2020.
Each configuration (CSV line) on each system was benchmarked at least 5 times and for at least 1 second in total.
For ndzip-gpu, the second number in the file name (-128, -256) indicates the (tunable) threads per block. This is evaluated for reasons of parameter tuning, the values we deem optimal are 256 for single-precision and 512 for double-precision data.
创建时间:
2021-06-02



