GitHub Projects for Defect Prediction
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/snaraya7/early-bird
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了240个基于其提交历史抽取的GitHub项目,旨在通过早期启发式方法来评估缺陷预测模型。这些项目根据星级评分被分为热门和非热门两大类。数据集涵盖了多种编程语言开发的项目,如Java、Python、C、C++等,分析所使用的特征是通过Commit Guru工具提取的。规模上,数据集包含了240个项目,其中155个为热门项目,85个为非热门项目。该数据集的任务是进行缺陷预测。
This dataset contains 240 GitHub projects extracted from their commit histories, with the goal of evaluating defect prediction models using early heuristic methods. These projects are categorized into two groups: popular and non-popular, based on their star ratings. The dataset encompasses projects developed in a variety of programming languages, such as Java, Python, C, C++, and others. The features utilized for analysis are extracted via the Commit Guru tool. In terms of scale, the dataset consists of 240 total projects, among which 155 are popular and 85 are non-popular. The core task of this dataset is defect prediction.
提供机构:
Open source projects on GitHub



