DeepDataFlow
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4122436
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains 493k LLVM-IRs taken from a wide range of projects and source programming languages, and includes labels for several compiler data analyses. We also include the logs for the machine learning jobs which produced our published experimental results.
The uncompressed dataset uses the following layout:
labels/
Directory containing machine learning features and labels for programs for compiler data flow analyses.
labels//...ProgramFeaturesList.pb
A ProgramFeaturesList protocol buffer containing a list of features resulting from running a data flow analysis on a program.
graphs/
Directory containing ProGraML representations of LLVM IRs.
graphs/...ProgramGraph.pb
A ProgramGraph protocol buffer of an LLVM IR in the ProGraML representation.
ll/
Directory containing LLVM-IR files.
ir/...ll
An LLVM IR in text format, as produced by clang -emit-llvm -S or equivalent.
test/
A directory containing symlinks to graphs in the graphs/ directory, indicating which graphs should be used as part of the test set.
train/
A directory containing symlinks to graphs in the graphs/ directory, indicating which graphs should be used as part of the training set.
val/
A directory containing symlinks to graphs in the graphs/ directory, indicating which graphs should be used as part of the validation set.
vocal/
Directory containing vocabulary files.
vocab/.csv
A vocabulary file, which lists unique node texts, their frequency in the dataset, and the cumulative proportion of total unique node texts that is covered.
For further information please see our ProGraML repository.
创建时间:
2020-11-05



