five

ChamaraVishwajithRajapaksha/Code_Vulnerability_Dataset

收藏
Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/ChamaraVishwajithRajapaksha/Code_Vulnerability_Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit tags: - security - cwe - vulnerability - code-analysis - software-security - dataset - machine-learning - llm-finetuning --- # 🔐 Code Vulnerability Dataset (CWE-Enriched) ## 📌 Overview This dataset is built from the **bstee615/diversevul** dataset and enhanced with structured vulnerability intelligence from the **MITRE Common Weakness Enumeration (CWE)** database. It provides a rich, machine-readable representation of software vulnerabilities, mapping raw vulnerable code samples to standardized CWE classifications. The dataset is designed for research and development in: - Vulnerability detection models - Secure code generation - LLM fine-tuning for cybersecurity tasks - Static analysis and code understanding systems --- ## 🧠 Dataset Enrichment Process Each sample in the dataset has been augmented using the MITRE CWE API to include structured security intelligence such as: - CWE identifier (e.g., CWE-787) - Vulnerability type (e.g., Out-of-bounds Write) - Human-readable description - Severity / exploit likelihood - Impact categories (e.g., code execution, crash) - Applicable programming languages - Security classification metadata --- ## 📊 Data Structure Each row in the dataset contains: ### 🔹 Original Fields - `func` → Source code snippet - `cwe` → Original CWE labels from DiverseVul dataset ### 🔹 Enriched Field - `cwe_details` → JSON object containing structured CWE metadata: ```json { "cwe_id": "CWE-787", "vulnerability_type": "Out-of-bounds Write", "description": "The product writes data past the end, or before the beginning, of the intended buffer.", "severity": "High", "category": "Memory Corruption", "impact": [ "Modify Memory", "Execute Unauthorized Code", "Crash (DoS)" ], "languages": ["C", "C++"], "example": "Example not extracted" }
提供机构:
ChamaraVishwajithRajapaksha
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作