five

DetectVul/Vudenc

收藏
Hugging Face2024-09-15 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/DetectVul/Vudenc
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: lines sequence: string - name: raw_lines sequence: string - name: label sequence: int64 - name: type sequence: string splits: - name: train num_bytes: 14476057 num_examples: 12672 - name: test num_bytes: 3485317 num_examples: 3169 download_size: 7020615 dataset_size: 17961374 --- # Dataset Card for "vul_lines" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) Original Paper: https://www.sciencedirect.com/science/article/abs/pii/S0167739X24004680 bibtex ``` @article{TRAN2024107504, title = {DetectVul: A statement-level code vulnerability detection for Python}, journal = {Future Generation Computer Systems}, pages = {107504}, year = {2024}, issn = {0167-739X}, doi = {https://doi.org/10.1016/j.future.2024.107504}, url = {https://www.sciencedirect.com/science/article/pii/S0167739X24004680}, author = {Hoai-Chau Tran and Anh-Duy Tran and Kim-Hung Le}, keywords = {Source code vulnerability detection, Deep learning, Natural language processing}, abstract = {Detecting vulnerabilities in source code using graph neural networks (GNN) has gained significant attention in recent years. However, the detection performance of these approaches relies highly on the graph structure, and constructing meaningful graphs is expensive. Moreover, they often operate at a coarse level of granularity (such as function-level), which limits their applicability to other scripting languages like Python and their effectiveness in identifying vulnerabilities. To address these limitations, we propose DetectVul, a new approach that accurately detects vulnerable patterns in Python source code at the statement level. DetectVul applies self-attention to directly learn patterns and interactions between statements in a raw Python function; thus, it eliminates the complicated graph extraction process without sacrificing model performance. In addition, the information about each type of statement is also leveraged to enhance the model’s detection accuracy. In our experiments, we used two datasets, CVEFixes and Vudenc, with 211,317 Python statements in 21,571 functions from real-world projects on GitHub, covering seven vulnerability types. Our experiments show that DetectVul outperforms GNN-based models using control flow graphs, achieving the best F1 score of 74.47%, which is 25.45% and 18.05% higher than the best GCN and GAT models, respectively.} } ```

数据集信息: 特征: - 特征名:lines,类型:字符串序列 - 特征名:raw_lines,类型:原始字符串序列 - 特征名:label(标签),类型:64位整数序列 - 特征名:type(类型),类型:字符串序列 数据集划分: - 划分名:训练集,字节大小:14476057,样本数量:12672 - 划分名:测试集,字节大小:3485317,样本数量:3169 下载大小:7020615,数据集总大小:17961374 # "vul_lines"数据集卡片 [需补充更多信息](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) 原始论文:https://www.sciencedirect.com/science/article/abs/pii/S0167739X24004680 BibTeX引用: bibtex @article{TRAN2024107504, title = {DetectVul:面向Python的语句级代码漏洞检测方法}, journal = {《未来计算机系统》(Future Generation Computer Systems)}, pages = {107504}, year = {2024}, issn = {0167-739X}, doi = {https://doi.org/10.1016/j.future.2024.107504}, url = {https://www.sciencedirect.com/science/article/pii/S0167739X24004680}, author = {Hoai-Chau Tran, Anh-Duy Tran, Kim-Hung Le}, keywords = {源代码漏洞检测(Source code vulnerability detection)、深度学习(Deep learning)、自然语言处理(Natural language processing)}, abstract = {近年来,采用图神经网络(Graph Neural Networks, GNN)检测源代码漏洞的研究受到广泛关注。然而,此类方法的检测性能高度依赖图结构,且构建有效图的成本高昂。此外,它们通常采用粗粒度的检测粒度(如函数级),这限制了其在Python等脚本语言中的适用性,也削弱了其漏洞识别效果。为解决上述局限,我们提出DetectVul——一种可在语句级精准检测Python源代码中漏洞模式的新方法。DetectVul采用自注意力机制直接学习原始Python函数内各语句的模式与交互关系,无需复杂的图提取流程,同时未牺牲模型性能。此外,我们还利用各类语句的信息进一步提升模型的检测准确率。在实验中,我们使用了CVEFixes与Vudenc两个数据集,涵盖GitHub上真实项目中的21571个函数、共计211317条Python语句,覆盖7种漏洞类型。实验结果表明,DetectVul的性能优于基于控制流图(Control Flow Graphs)的图神经网络模型,最优F1分数(F1 score)达到74.47%,分别比当前最优的图卷积网络(Graph Convolutional Network, GCN)与图注意力网络(Graph Attention Network, GAT)高出25.45%与18.05%。} }
提供机构:
DetectVul
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作