ChamaraVishwajithRajapaksha/Code_Vulnerability_Dataset
收藏Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/ChamaraVishwajithRajapaksha/Code_Vulnerability_Dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
tags:
- security
- cwe
- vulnerability
- code-analysis
- software-security
- dataset
- machine-learning
- llm-finetuning
---
# 🔐 Code Vulnerability Dataset (CWE-Enriched)
## 📌 Overview
This dataset is built from the **bstee615/diversevul** dataset and enhanced with structured vulnerability intelligence from the **MITRE Common Weakness Enumeration (CWE)** database.
It provides a rich, machine-readable representation of software vulnerabilities, mapping raw vulnerable code samples to standardized CWE classifications.
The dataset is designed for research and development in:
- Vulnerability detection models
- Secure code generation
- LLM fine-tuning for cybersecurity tasks
- Static analysis and code understanding systems
---
## 🧠 Dataset Enrichment Process
Each sample in the dataset has been augmented using the MITRE CWE API to include structured security intelligence such as:
- CWE identifier (e.g., CWE-787)
- Vulnerability type (e.g., Out-of-bounds Write)
- Human-readable description
- Severity / exploit likelihood
- Impact categories (e.g., code execution, crash)
- Applicable programming languages
- Security classification metadata
---
## 📊 Data Structure
Each row in the dataset contains:
### 🔹 Original Fields
- `func` → Source code snippet
- `cwe` → Original CWE labels from DiverseVul dataset
### 🔹 Enriched Field
- `cwe_details` → JSON object containing structured CWE metadata:
```json
{
"cwe_id": "CWE-787",
"vulnerability_type": "Out-of-bounds Write",
"description": "The product writes data past the end, or before the beginning, of the intended buffer.",
"severity": "High",
"category": "Memory Corruption",
"impact": [
"Modify Memory",
"Execute Unauthorized Code",
"Crash (DoS)"
],
"languages": ["C", "C++"],
"example": "Example not extracted"
}
提供机构:
ChamaraVishwajithRajapaksha



