DetectVul/devign
收藏Hugging Face2024-09-15 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/DetectVul/devign
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: id
dtype: int32
- name: func
dtype: string
- name: target
dtype: bool
- name: project
dtype: string
- name: commit_id
dtype: string
- name: func_clean
dtype: string
- name: vul_lines
struct:
- name: code
sequence: string
- name: line_no
sequence: int64
- name: normalized_func
dtype: string
- name: lines
sequence: string
- name: label
sequence: int64
- name: line_no
sequence:
sequence: int64
splits:
- name: test
num_bytes: 22801956
num_examples: 2732
- name: train
num_bytes: 183794878
num_examples: 21854
- name: validation
num_bytes: 22451009
num_examples: 2732
download_size: 72100845
dataset_size: 229047843
---
# Dataset Card for "devign_with_norm_vul_lines"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
Original Paper: https://www.sciencedirect.com/science/article/abs/pii/S0167739X24004680
bibtex
```
@article{TRAN2024107504,
title = {DetectVul: A statement-level code vulnerability detection for Python},
journal = {Future Generation Computer Systems},
pages = {107504},
year = {2024},
issn = {0167-739X},
doi = {https://doi.org/10.1016/j.future.2024.107504},
url = {https://www.sciencedirect.com/science/article/pii/S0167739X24004680},
author = {Hoai-Chau Tran and Anh-Duy Tran and Kim-Hung Le},
keywords = {Source code vulnerability detection, Deep learning, Natural language processing},
abstract = {Detecting vulnerabilities in source code using graph neural networks (GNN) has gained significant attention in recent years. However, the detection performance of these approaches relies highly on the graph structure, and constructing meaningful graphs is expensive. Moreover, they often operate at a coarse level of granularity (such as function-level), which limits their applicability to other scripting languages like Python and their effectiveness in identifying vulnerabilities. To address these limitations, we propose DetectVul, a new approach that accurately detects vulnerable patterns in Python source code at the statement level. DetectVul applies self-attention to directly learn patterns and interactions between statements in a raw Python function; thus, it eliminates the complicated graph extraction process without sacrificing model performance. In addition, the information about each type of statement is also leveraged to enhance the model’s detection accuracy. In our experiments, we used two datasets, CVEFixes and Vudenc, with 211,317 Python statements in 21,571 functions from real-world projects on GitHub, covering seven vulnerability types. Our experiments show that DetectVul outperforms GNN-based models using control flow graphs, achieving the best F1 score of 74.47%, which is 25.45% and 18.05% higher than the best GCN and GAT models, respectively.}
}
```
提供机构:
DetectVul
搜集汇总
背景与挑战
背景概述
该数据集是DetectVul项目的一部分,专门用于Python源代码的漏洞检测,提供函数级和语句级的标注数据,包括易受攻击行信息和多种特征字段。数据集包含训练、验证和测试分割,适用于基于深度学习的漏洞检测模型开发,并支持语句级分析以提升检测精度。
以上内容由遇见数据集搜集并总结生成



