Dataset of duplicate vulnerability records across databases
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14580766
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains vulnerability duplicate information from two sources: the cross-database duplicates and the GitHub Advisory Database duplicates. The dataset is provided in JSON format and is intended for use in research related to vulnerability matching and duplication detection.
## Dataset Overview
The dataset consists of two files:
1. **cross_database_duplicates.json**: Contains 22,163 pairs of duplicate vulnerabilities identified across multiple databases.2. **github_advisory_database_duplicates.json**: Contains 133 pairs of duplicate vulnerabilities specifically from the GitHub Advisory Database.
## File Format
Both files are in JSON format. Each record consists of four attributes:
- `id_1`: The ID of the first vulnerability report.- `id_2`: The ID of the second vulnerability report.- `record_1`: The first vulnerability report.- `record_2`: The second vulnerability report.
These attributes are designed to help users identify and compare vulnerability reports that are considered duplicates.
## Usage
This dataset can be used for studies in vulnerability matching, natural language processing (NLP) applications, and the development of tools for detecting duplicate vulnerabilities in different databases.
创建时间:
2024-12-31



