Classical bounds on two-outcome bipartite Bell expressions and linear prepare-and-measure witnesses: Efficient computation in parallel environments such as graphics processing units
收藏Mendeley Data2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/scfjjt9svm/1
下载链接
链接失效反馈官方服务:
资源简介:
The presented program aims at speeding up the brute force computation of the so-called L_d norm of a matrix M using graphics processing units (GPUs). Alternatives for CPUs have also been implemented, and the algorithm is applicable to any parallel environment. The n x m matrix M has real elements which may represent coefficients of a bipartite Bell expression or those of a linear prepare-and-measure (PM) witness. In this interpretation, the L_1 norm is the local bound of the given correlation-type Bell expression, and the L_d norm for d ≥ 2 is the classical d-dimensional bound of the given PM witness, which is associated with the communication of d-level classical messages in the PM scenario. The program is also capable of calculating the local bound of Bell expressions including marginals. In all scenarios, the output is assumed to be binary.
The code for GPUs is written in CUDA C and can utilize one NVIDIA GPU in a computer. To illustrate the performance of our implementation, we refer to Brierley et al. [1] who needed approximately three weeks to compute the local bound on a Bell expression defined by a 42 x 42 matrix on a standard desktop using a single CPU core. In contrast, our efficient implementation of the brute force algorithm allows us to reduce this to three minutes using a single NVIDIA RTX 6000 Ada graphics card on a workstation. For CPUs, the algorithm was implemented with OpenMP and MPI according to the shared and distributed memory models, respectively, and achieves a comparable speedup at a number of CPU cores around 100.
本程序旨在利用图形处理器(Graphics Processing Unit, GPU)加速矩阵M的所谓L_d范数的蛮力计算。本程序同时实现了中央处理器(Central Processing Unit, CPU)端的计算方案,且该算法可适配任意并行环境。n×m维矩阵M的元素为实数,可用于表示二分贝尔表达式的系数,或线性制备-测量(Prepare-and-Measure, PM)见证的系数。在此定义下,L_1范数即为给定关联型贝尔表达式的局域界,而当d≥2时,L_d范数对应给定PM见证的经典d维界,该界与PM场景下的d能级经典消息通信相关联。本程序还可计算包含边际分布的贝尔表达式的局域界。所有计算场景中,输出均为二值化结果。
GPU端代码采用CUDA C语言编写,可调用单台计算机中的一块NVIDIA GPU。为展示本实现的性能,我们参考Brierley等人的研究[1]:其在标准台式机上使用单个CPU核心计算42×42矩阵定义的贝尔表达式的局域界时,耗时约三周。与之相比,我们基于蛮力算法的高效实现,在工作站上使用单块NVIDIA RTX 6000 Ada图形卡即可将该耗时压缩至三分钟。针对CPU端,本算法分别基于共享内存模型与分布式内存模型,采用OpenMP与MPI实现,在约100个CPU核心的配置下可实现相当的加速比。
提供机构:
Magyar Tudomanyos Akademia Atommagkutato Intezet



