DetectVul/devign

Name: DetectVul/devign
Creator: DetectVul
Published: 2024-09-15 13:21:57
License: 暂无描述

Hugging Face2024-09-15 更新2025-04-26 收录

下载链接：

https://hf-mirror.com/datasets/DetectVul/devign

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: id dtype: int32 - name: func dtype: string - name: target dtype: bool - name: project dtype: string - name: commit_id dtype: string - name: func_clean dtype: string - name: vul_lines struct: - name: code sequence: string - name: line_no sequence: int64 - name: normalized_func dtype: string - name: lines sequence: string - name: label sequence: int64 - name: line_no sequence: sequence: int64 splits: - name: test num_bytes: 22801956 num_examples: 2732 - name: train num_bytes: 183794878 num_examples: 21854 - name: validation num_bytes: 22451009 num_examples: 2732 download_size: 72100845 dataset_size: 229047843 --- # Dataset Card for "devign_with_norm_vul_lines" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) Original Paper: https://www.sciencedirect.com/science/article/abs/pii/S0167739X24004680 bibtex ``` @article{TRAN2024107504, title = {DetectVul: A statement-level code vulnerability detection for Python}, journal = {Future Generation Computer Systems}, pages = {107504}, year = {2024}, issn = {0167-739X}, doi = {https://doi.org/10.1016/j.future.2024.107504}, url = {https://www.sciencedirect.com/science/article/pii/S0167739X24004680}, author = {Hoai-Chau Tran and Anh-Duy Tran and Kim-Hung Le}, keywords = {Source code vulnerability detection, Deep learning, Natural language processing}, abstract = {Detecting vulnerabilities in source code using graph neural networks (GNN) has gained significant attention in recent years. However, the detection performance of these approaches relies highly on the graph structure, and constructing meaningful graphs is expensive. Moreover, they often operate at a coarse level of granularity (such as function-level), which limits their applicability to other scripting languages like Python and their effectiveness in identifying vulnerabilities. To address these limitations, we propose DetectVul, a new approach that accurately detects vulnerable patterns in Python source code at the statement level. DetectVul applies self-attention to directly learn patterns and interactions between statements in a raw Python function; thus, it eliminates the complicated graph extraction process without sacrificing model performance. In addition, the information about each type of statement is also leveraged to enhance the model’s detection accuracy. In our experiments, we used two datasets, CVEFixes and Vudenc, with 211,317 Python statements in 21,571 functions from real-world projects on GitHub, covering seven vulnerability types. Our experiments show that DetectVul outperforms GNN-based models using control flow graphs, achieving the best F1 score of 74.47%, which is 25.45% and 18.05% higher than the best GCN and GAT models, respectively.} } ```

提供机构：

DetectVul

搜集汇总

背景与挑战

背景概述

该数据集是DetectVul项目的一部分，专门用于Python源代码的漏洞检测，提供函数级和语句级的标注数据，包括易受攻击行信息和多种特征字段。数据集包含训练、验证和测试分割，适用于基于深度学习的漏洞检测模型开发，并支持语句级分析以提升检测精度。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集