Scalable Vulnerability Classification: Siamese Network with LLMs for CVE-CWE Mapping

Figshare2026-01-03 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/Scalable_Vulnerability_Classification_Siamese_Network_with_LLMs_for_CVE-CWE_Mapping/30994396

下载链接

链接失效反馈

官方服务：

资源简介：

Abstract—Context: Software bugs often represent critical security vulnerabilities that can lead to unauthorized access and data leakage. To systematically classify such vulnerabilities, the security community relies on two key resources: the Common Weakness Enumeration (CWE) taxonomy and the Common Vulnerabilities and Exposures (CVE) database. Objective: We propose a Siamese network-based framework combined with Large Language Models (LLMs) to automate the mapping of data-leak-related CVEs to their root cause CWEs. Unlike prior work that treats this as a generic multi-class classification task, we formulate it as a semantic ranking problem focused on distinguishing fine-grained information exposure weaknesses to better reflect real-world analyst workflows. Method: We utilize a ”Hybrid Hard-Negative Mining” strategy that trains the model to distinguish between semantically similar but incorrect weakness categories. We further employ a CVE-disjoint split to ensure the model generalizes to unseen vulnerabilities rather than memorizing descriptions. Results: Experimental results demonstrate that our framework effectively retrieves the correct weakness from a focused taxonomy of 115 data-leakage categories. On a rigorous ranking benchmark, the model achieves a Recall@10 of 92.8% and a Mean Reciprocal Rank (MRR) of 72.4%. Conclusions: This automated mapping expedites the classification of critical data exposure vulnerabilities and reduces the dependency on labor-intensive manual processes, offering a scalable solution for more efficient vulnerability management.

创建时间：

2026-01-03

5,000+

优质数据集

54 个

任务类型

进入经典数据集