Dataset information of the dataset.
收藏Figshare2026-01-20 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/_p_Dataset_information_of_the_dataset_p_/31106883
下载链接
链接失效反馈官方服务:
资源简介:
Semantic code clone detection plays an essential role in software maintenance and quality assurance, as it helps uncover fragments of code that express the same logic even when their syntax has been altered or deliberately obfuscated. In this study, we propose a framework that combines hybrid representation learning with deep bidirectional LSTM networks. The model is applied to two intermediate forms of Java programs—Baf and Jimple—extracted through the Soot framework, which together provide both syntactic structure and semantic detail. This design allows the method to cope with difficult obfuscation strategies such as polymorphism and metamorphism. In our experiments, the framework showed strong and stable performance. Training accuracy reached about 98%, while validation accuracy stayed above 95%, with good generalization across the different clone categories described in the Twilight-Zone taxonomy. When compared with other recurrent models, the BiLSTM consistently performed better, especially when combined with multiple intermediate representations and attention mechanisms. On the BigCloneBench dataset, the approach matched or exceeded the results of state-of-the-art tools, achieving recall and F1-scores of up to 97% on challenging clone types. These findings confirm the practical applicability of hybrid intermediate representations for semantic clone detection and suggest promising directions for future research using transformer-based models and large-scale deployment.
创建时间:
2026-01-20



