five

Dataset used for paper "Issues-Driven Features for Software Fault Prediction"

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7266447
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset used for paper "Issues-Driven Features for Software Fault Prediction".       The dataset contains 86 projects from the open source organizations Apache and Spring were written in Java that managed their source code using the Git version control system and an issue tracking system (JIRA or BUGZILLA).    For each project, we extracted data for software fault prediction (SFL) task as follows:   First, we filtered out projects without reported resolved bugs or less than 5 released versions. Then we iterated the resolved bugs and mapped them to the commits that fixed them. Next, for each version, we labeled the faulty files in the version. A faulty file is a file that was modified in a commit in the version that resolved a bug.  The methodology for labeling the files is a variant of the very known approach implemented in the {\it SZZ} algorithm, which accounts for its vulnerabilities. Finally, for each project, we filtered out versions with faulty files' ratios lower than 5\% and higher than 30\%. The remaining set includes a good representation of bugs and reduces the class imbalance produced by the low number of defects. In addition, filtering versions with a low number of bugs helps to prune outliers, such as a version created to fix specific issues.   Extracted using Beirut repository mining tool: https://github.com/beirut-repository-mining/repository_mining
创建时间:
2022-11-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作