Basic Blocks Dataset
收藏arXiv2025-09-30 收录
下载链接:
https://nmt4binaries.github.io
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了202,252对在OpenSSL和四个流行的Linux软件包中准备的语义相似的基本块对,这些基本块对由两种架构(x86-64和ARM)编译而成。在准备此数据集时,我们使用了LLVM编译器的修改版本来注释来自同一源代码的基本块。数据集的规模为202,252对,其任务是对跨架构的指令嵌入进行相似度比较。
This dataset contains 202,252 pairs of semantically similar basic blocks, which are extracted from OpenSSL and four widely used Linux software packages. Each pair of basic blocks is compiled for two target architectures: x86-64 and ARM. During the dataset preparation phase, we employed a modified version of the LLVM compiler to annotate basic blocks originating from the same source code. With a total of 202,252 pairs, the designated task of this dataset is to perform similarity comparison of cross-architecture instruction embeddings.
提供机构:
NMT4Binaries



