ChamaraVishwajithRajapaksha/Code_Vulnerability_Dataset

Name: ChamaraVishwajithRajapaksha/Code_Vulnerability_Dataset
Creator: ChamaraVishwajithRajapaksha
Published: 2026-04-16 11:03:18
License: 暂无描述

Hugging Face2026-04-16 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/ChamaraVishwajithRajapaksha/Code_Vulnerability_Dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit tags: - security - cwe - vulnerability - code-analysis - software-security - dataset - machine-learning - llm-finetuning --- # 🔐 Code Vulnerability Dataset (CWE-Enriched) ## 📌 Overview This dataset is built from the **bstee615/diversevul** dataset and enhanced with structured vulnerability intelligence from the **MITRE Common Weakness Enumeration (CWE)** database. It provides a rich, machine-readable representation of software vulnerabilities, mapping raw vulnerable code samples to standardized CWE classifications. The dataset is designed for research and development in: - Vulnerability detection models - Secure code generation - LLM fine-tuning for cybersecurity tasks - Static analysis and code understanding systems --- ## 🧠 Dataset Enrichment Process Each sample in the dataset has been augmented using the MITRE CWE API to include structured security intelligence such as: - CWE identifier (e.g., CWE-787) - Vulnerability type (e.g., Out-of-bounds Write) - Human-readable description - Severity / exploit likelihood - Impact categories (e.g., code execution, crash) - Applicable programming languages - Security classification metadata --- ## 📊 Data Structure Each row in the dataset contains: ### 🔹 Original Fields - `func` → Source code snippet - `cwe` → Original CWE labels from DiverseVul dataset ### 🔹 Enriched Field - `cwe_details` → JSON object containing structured CWE metadata: ```json { "cwe_id": "CWE-787", "vulnerability_type": "Out-of-bounds Write", "description": "The product writes data past the end, or before the beginning, of the intended buffer.", "severity": "High", "category": "Memory Corruption", "impact": [ "Modify Memory", "Execute Unauthorized Code", "Crash (DoS)" ], "languages": ["C", "C++"], "example": "Example not extracted" }

提供机构：

ChamaraVishwajithRajapaksha

5,000+

优质数据集

54 个

任务类型

进入经典数据集