five

DetectVul/devign

收藏
Hugging Face2024-09-15 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/DetectVul/devign
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: id dtype: int32 - name: func dtype: string - name: target dtype: bool - name: project dtype: string - name: commit_id dtype: string - name: func_clean dtype: string - name: vul_lines struct: - name: code sequence: string - name: line_no sequence: int64 - name: normalized_func dtype: string - name: lines sequence: string - name: label sequence: int64 - name: line_no sequence: sequence: int64 splits: - name: test num_bytes: 22801956 num_examples: 2732 - name: train num_bytes: 183794878 num_examples: 21854 - name: validation num_bytes: 22451009 num_examples: 2732 download_size: 72100845 dataset_size: 229047843 --- # Dataset Card for "devign_with_norm_vul_lines" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) Original Paper: https://www.sciencedirect.com/science/article/abs/pii/S0167739X24004680 bibtex ``` @article{TRAN2024107504, title = {DetectVul: A statement-level code vulnerability detection for Python}, journal = {Future Generation Computer Systems}, pages = {107504}, year = {2024}, issn = {0167-739X}, doi = {https://doi.org/10.1016/j.future.2024.107504}, url = {https://www.sciencedirect.com/science/article/pii/S0167739X24004680}, author = {Hoai-Chau Tran and Anh-Duy Tran and Kim-Hung Le}, keywords = {Source code vulnerability detection, Deep learning, Natural language processing}, abstract = {Detecting vulnerabilities in source code using graph neural networks (GNN) has gained significant attention in recent years. However, the detection performance of these approaches relies highly on the graph structure, and constructing meaningful graphs is expensive. Moreover, they often operate at a coarse level of granularity (such as function-level), which limits their applicability to other scripting languages like Python and their effectiveness in identifying vulnerabilities. To address these limitations, we propose DetectVul, a new approach that accurately detects vulnerable patterns in Python source code at the statement level. DetectVul applies self-attention to directly learn patterns and interactions between statements in a raw Python function; thus, it eliminates the complicated graph extraction process without sacrificing model performance. In addition, the information about each type of statement is also leveraged to enhance the model’s detection accuracy. In our experiments, we used two datasets, CVEFixes and Vudenc, with 211,317 Python statements in 21,571 functions from real-world projects on GitHub, covering seven vulnerability types. Our experiments show that DetectVul outperforms GNN-based models using control flow graphs, achieving the best F1 score of 74.47%, which is 25.45% and 18.05% higher than the best GCN and GAT models, respectively.} } ```
提供机构:
DetectVul
搜集汇总
背景与挑战
背景概述
该数据集是DetectVul项目的一部分,专门用于Python源代码的漏洞检测,提供函数级和语句级的标注数据,包括易受攻击行信息和多种特征字段。数据集包含训练、验证和测试分割,适用于基于深度学习的漏洞检测模型开发,并支持语句级分析以提升检测精度。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作