CWE dataset
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/floxihunter/synthetic_python_cwe
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个专门为在Python代码中检测常见弱点枚举(CWE)而开发的500个样本集,它采用了基于大型语言模型的数据合成生成技术,并经过了人工审核。此外,该数据集被用于对codegen-mono模型进行微调,在微调后,该模型在CWE检测任务上达到了99%的准确率。这一规模为500个样本的数据集,其任务是针对Python代码中的CWE检测。
This dataset is a 500-sample collection specifically developed for detecting Common Weakness Enumeration (CWE) vulnerabilities in Python code. It leverages Large Language Model (LLM)-driven data synthesis and generation workflows, and has undergone manual review. Furthermore, it has been utilized to fine-tune the CodeGen-Mono model, which achieved a 99% accuracy rate on the CWE detection task post fine-tuning. This 500-sample dataset is dedicated to CWE detection in Python code.
提供机构:
floxihunter



