Software Similarity Dataset
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7071382
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains the post-processed data for software similarity learning. More information is given: SoftwareSim_Github
post_process: All embedded software with autoencoder to make sure each function is the same length (1024 bits), each final is the embedded graph representation of software.
final_data: All information obtained by Somef & Inspect4py as well as cleaning. Each file represents software in the format given --> Function_Name: [[Called Function], [Function Tokens]]
lean_simscore.csv: This file contains software pairs as well as the similarity metrics, format is given:
Property
Example
Graph_1
kakaobrain_helo_word
Graph_2
mblondel_soft-dtw
miniLM
0.4503
Sbert
0.7204
TSDAE
0.5714
创建时间:
2022-12-28



