An Empirical Study of Deep Learning Models for Vulnerability Detection

Name: An Empirical Study of Deep Learning Models for Vulnerability Detection
Creator: figshare
Published: 2023-02-10 02:33:59
License: 暂无描述

DataCite Commons2023-02-10 更新2024-08-18 收录

下载链接：

https://figshare.com/articles/dataset/An_Empirical_Study_of_Deep_Learning_Models_for_Vulnerability_Detection/20791240/2

下载链接

链接失效反馈

官方服务：

资源简介：

Deep learning (DL) models of code have recently reported great progress for vulnerability detection. In some cases, DL-based models have outperformed static analysis tools. Although many great models have been proposed, we do not yet have a good understanding of these models. This limits the further advancement of model robustness, debugging, and deployment for the vulnerability detection. In this paper, we surveyed and reproduced 9 state-of-the-art (SOTA) deep learning models on 2 widely used vulnerability detection datasets: Devign and MSR. We investigated 6 research questions in three areas, namely model capabilities, training data, and model interpretation. We experimentally demonstrated the variability between different runs of a model and the low agreement among different models’ outputs. We investigated models trained for specific types of vulnerabilities compared to a model that is trained on all the vulnerabilities at once. We explored the types of programs DL may consider ”hard” to handle. We investigated the relations of training data sizes and training data composition with model performance. Finally, we studied model interpretations and analyzed important features that the models used to make predictions. We believe that our findings can help better understand model results, provide guidance on preparing training data, and improve the robustness of the models.

近年来，代码领域的深度学习（DL）模型在漏洞检测任务中取得了显著进展，部分基于深度学习的模型性能已超越静态分析工具。尽管已有诸多优秀模型被提出，但目前学界对这类模型尚未形成充分认知，这掣肘了漏洞检测场景下模型鲁棒性提升、调试与部署工作的进一步发展。本文针对Devign与MSR这两个广泛应用的漏洞检测数据集，调研并复现了9个当前最优（SOTA）深度学习模型。我们从模型能力、训练数据、模型可解释性三大研究维度，共开展了六项研究问题的探究。实验中，我们验证了同一模型多次训练运行间的结果差异性，以及不同模型输出结果间的一致性偏低的现象；对比了针对特定漏洞类型训练的模型与一次性在全漏洞数据集上训练的模型；探究了深度学习模型难以处理的程序类型；分析了训练数据规模与数据构成对模型性能的影响；最后针对模型可解释性展开研究，剖析了模型用于生成预测结果的关键特征。我们认为，本次研究的发现可助力学界更深入地理解模型输出结果，为训练数据准备工作提供科学指导，并助力提升漏洞检测模型的鲁棒性。

提供机构：

figshare

创建时间：

2023-02-08

5,000+

优质数据集

54 个

任务类型

进入经典数据集