Software vulnerability detection datasets - function/method level
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/10266598
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is for software vulnerability detection and includes source code in eight programming languages (C, C++, Java, JavaScript, Go, PHP, Ruby, Python). All data is collected from GitHub.
data{programming language}_vul.json: a set of vulnerable code samples in a certain programming language.
data{programming language}_patch.json: a set of patching code samples in a certain programming language.
Each source code sample includes the following 16 properties:
index: index of code. If is_vulnerable==False, this index indicates that this code is a patch of the indexing vulnerable code.
code: raw source code (may include comments).
is_vulnerable: the code is vulnerable (True) or a patch (False).
programming_language: programming language of the code.
method_name: name of the method.
file_name: name of the file where the source code is extracted.
repo_url: url of the project repository.
repo_owner: owner of the repository.
committer: developer who pushed the commit.
committer_date: date when the commit was pushed.
commit_msg: the commit message.
cwe_id: If is_vulnerable==True, the CWE id; otherwise None.
cwe_name: If is_vulnerable==True, the name of corresponding CWE; otherwise None.
cwe_description: If is_vulnerable==True, the description of corresponding CWE; otherwise None.
cwe_url: If is_vulnerable==True, the url to obtain more details of corresponding CWE; otherwise None.
cve_id: If is_vulnerable==True, the CVE id; otherwise None.
创建时间:
2024-10-01



