Rediscovery Datasets: Connecting Duplicate Reports of Apache, Eclipse, and KDE

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/records/400614

下载链接

链接失效反馈

官方服务：

资源简介：

We present three defect rediscovery datasets mined from Bugzilla. The datasets capture data for three groups of open source software projects: Apache, Eclipse, and KDE. The datasets contain information about approximately 914 thousands of defect reports over a period of 18 years (1999-2017) to capture the inter-relationships among duplicate defects. File Descriptions apache.csv - Apache Defect Rediscovery dataset eclipse.csv - Eclipse Defect Rediscovery dataset kde.csv - KDE Defect Rediscovery dataset apache.relations.csv - Inter-relations of rediscovered defects of Apache eclipse.relations.csv - Inter-relations of rediscovered defects of Eclipse kde.relations.csv - Inter-relations of rediscovered defects of KDE create_and_populate_neo4j_objects.cypher - Populates Neo4j graphDB by importing all the data from the CSV files. Note that you have to set dbms.import.csv.legacy_quote_escaping configuration setting to false to load the CSV files as per https://neo4j.com/docs/operations-manual/current/reference/configuration-settings/#config_dbms.import.csv.legacy_quote_escaping create_and_populate_mysql_objects.sql - Populates MySQL RDBMS by importing all the data from the CSV files rediscovery_db_mysql.zip - For your convenience, we also provide full backup of the MySQL database neo4j_examples.txt - Sample Neo4j queries mysql_examples.txt - Sample MySQL queries rediscovery_eclipse_6325.png - Output of Neo4j example #1 distinct_attrs.csv - Distinct values of bug_status, resolution, priority, severity for each project

创建时间：

2024-08-03

5,000+

优质数据集

54 个

任务类型

进入经典数据集