five

Scalable Vulnerability Classification: Siamese Network with LLMs for CVE-CWE Mapping

收藏
Figshare2026-01-03 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Scalable_Vulnerability_Classification_Siamese_Network_with_LLMs_for_CVE-CWE_Mapping/30994396
下载链接
链接失效反馈
官方服务:
资源简介:
Abstract—Context: Software bugs often represent critical security vulnerabilities that can lead to unauthorized access and data leakage. To systematically classify such vulnerabilities, the security community relies on two key resources: the Common Weakness Enumeration (CWE) taxonomy and the Common Vulnerabilities and Exposures (CVE) database. Objective: We propose a Siamese network-based framework combined with Large Language Models (LLMs) to automate the mapping of data-leak-related CVEs to their root cause CWEs. Unlike prior work that treats this as a generic multi-class classification task, we formulate it as a semantic ranking problem focused on distinguishing fine-grained information exposure weaknesses to better reflect real-world analyst workflows. Method: We utilize a ”Hybrid Hard-Negative Mining” strategy that trains the model to distinguish between semantically similar but incorrect weakness categories. We further employ a CVE-disjoint split to ensure the model generalizes to unseen vulnerabilities rather than memorizing descriptions. Results: Experimental results demonstrate that our framework effectively retrieves the correct weakness from a focused taxonomy of 115 data-leakage categories. On a rigorous ranking benchmark, the model achieves a Recall@10 of 92.8% and a Mean Reciprocal Rank (MRR) of 72.4%. Conclusions: This automated mapping expedites the classification of critical data exposure vulnerabilities and reduces the dependency on labor-intensive manual processes, offering a scalable solution for more efficient vulnerability management.
创建时间:
2026-01-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作