Dataset used for paper "Issues-Driven Features for Software Fault Prediction"
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7266447
下载链接
链接失效反馈官方服务:
资源简介:
Dataset used for paper "Issues-Driven Features for Software Fault Prediction".
The dataset contains 86 projects from the open source organizations Apache and Spring were written in Java that managed their source code using the Git version control system and an issue tracking system (JIRA or BUGZILLA).
For each project, we extracted data for software fault prediction (SFL) task as follows:
First, we filtered out projects without reported resolved bugs or less than 5 released versions.
Then we iterated the resolved bugs and mapped them to the commits that fixed them.
Next, for each version, we labeled the faulty files in the version. A faulty file is a file that was modified in a commit in the version that resolved a bug.
The methodology for labeling the files is a variant of the very known approach implemented in the {\it SZZ} algorithm, which accounts for its vulnerabilities.
Finally, for each project, we filtered out versions with faulty files' ratios lower than 5\% and higher than 30\%. The remaining set includes a good representation of bugs and reduces the class imbalance produced by the low number of defects. In addition, filtering versions with a low number of bugs helps to prune outliers, such as a version created to fix specific issues.
Extracted using Beirut repository mining tool:
https://github.com/beirut-repository-mining/repository_mining
创建时间:
2022-11-01



