Experiment setup parameters.
收藏Figshare2026-02-03 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/_p_Experiment_setup_parameters_p_/31243615
下载链接
链接失效反馈官方服务:
资源简介:
Sorting can be approached in two main ways: sequentially and in parallel. In sequential sorting, data is processed in a single-threaded manner, which can be slow for large datasets. However, parallel sorting divides the task across multiple processing units, enabling faster results by processing data simultaneously. Furthermore, Compute Unified Device Architecture (CUDA) technology enables developers to leverage GPU power for general-purpose parallel computing, significantly accelerating tasks like sorting. This paper investigates the GPU-based parallelization of merge sort (MS), quick sort (QS), bubble sort (BS), radix top-k selection sort (RS), and slow sort (SS) presenting optimized algorithms designed for efficient sorting of large datasets using modern GPUs. The primary objective is to evaluate the performance of these algorithms on GPUs utilizing CUDA, with a focus on analyzing both parallel time complexity and space complexity across various data types. Experiments are conducted on four dataset scenarios: randomly generated data, reverse-sorted data, already-sorted data, and nearly-sorted data. Also, the performance of GPU-accelerated implementations is compared with their sequential counterparts to assess improvements in computational efficiency and scalability. Earlier GPU-based generations of this type typically achieved acceleration rates between 2× and 9× over scalar CPU code. With newer GPU enhancements, including parallel-aware primitives and radix- or merge-optimized operations, acceleration rates have seen significant improvement. Our experiments indicate that Radix Sort based on GPUs achieves a significant speedup of approximately 50× (sequential: 240.8 ms, parallel: 4.83 ms) on 10 million random sort elements. Quick Sort and Merge Sort have 97× and 103× speedups, respectively (Quick: 1461.97 ms vs. 15.1 ms; Merge: 2212.33 ms vs. 21.4 ms). Bubble Sort, while significantly improving in parallel (123,321.9 ms to 7377.8 ms for an ≈17× improvement), is considerably worse overall. Slow Sort demonstrates a moderate but consistent acceleration, reducing execution time from 74.07 ms in the sequential version to 3.99 ms on the GPU, yielding an ≈18.6× speedup. These experimental findings confirm that the new single-GPU implementations can get speedups ranging from 17× to over 100×, surpassing the typical gains reported in previous generations and comparable to or over rates of acceleration reported for cutting-edge parallel sorting algorithms in recent studies.
排序问题主要可通过两种路径求解:串行排序与并行排序。串行排序采用单线程方式处理数据,面对大规模数据集时往往效率低下;而并行排序则将任务拆分至多个处理单元,通过并行处理数据实现更快的运算结果。此外,统一计算设备架构(Compute Unified Device Architecture,CUDA)可让开发者借助图形处理器(Graphics Processing Unit, GPU)算力开展通用并行计算,显著加速排序等各类任务。本文针对归并排序(merge sort, MS)、快速排序(quick sort, QS)、冒泡排序(bubble sort, BS)、基数Top-K选择排序(radix top-k selection sort, RS)以及慢排序(slow sort, SS)的GPU并行化方案展开研究,提出了适配现代GPU、可高效处理大规模数据集的优化排序算法。本研究的核心目标是基于CUDA平台评估上述算法在GPU上的运行性能,重点分析不同数据类型下各算法的并行时间复杂度与空间复杂度。本次实验设置了四类数据集场景:随机生成数据、逆序数据、已排序数据以及近似有序数据。同时,本文将GPU加速实现的算法性能与其串行版本进行对比,以评估其在计算效率与可扩展性上的提升效果。此前同类GPU并行方案在标量CPU代码上通常可实现2倍至9倍的加速比。随着新一代GPU在并行感知原语、基数优化或归并优化运算等方面的性能升级,加速比得到了显著提升。本次实验结果显示,基于GPU的基数排序在处理1000万条随机排序元素时可实现约50倍的加速(串行版本耗时240.8 ms,并行版本耗时4.83 ms);快速排序与归并排序的加速比分别可达97倍与103倍(快速排序串行耗时1461.97 ms,并行耗时15.1 ms;归并排序串行耗时2212.33 ms,并行耗时21.4 ms)。冒泡排序虽在并行化后性能得到显著提升(耗时从123321.9 ms降至7377.8 ms,加速比约17倍),但整体性能仍相对较差。慢排序则实现了幅度适中但稳定的加速,其执行耗时从串行版本的74.07 ms降至GPU上的3.99 ms,加速比约为18.6倍。上述实验结果证实,新型单GPU实现方案的加速比可达17倍至100倍以上,不仅超越了此前同类方案的典型加速增益,还可媲美甚至超过近期研究中前沿并行排序算法的加速性能。
创建时间:
2026-02-03



