A Comprehensive Technique to Predict the Size of Maintenance Issues

Figshare2018-10-05 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/A_Holistic_Developer-Oriented_Model_for_Estimating_the_Effort_of_Maintenance_Tasks/5726854

下载链接

链接失效反馈

官方服务：

资源简介：

Software systems continuously evolve over time because of changes in the requirements, code refactoring, or bug fixing activities. A way to quantify the extent of a change is given by code churn, that represents the sum of added, modified and deleted lines of code by a developer to perform such a change. Previous research showed that code churn can be adopted by practitioners to perform early evaluation of defect density, presence of vulnerabilities, or to simply monitor the impact of a code change. We argue that an automated software analytics technique able to inform developers of the quantity of code churn needed to perform a maintenance issue might be useful when assessing the complexity and the possible hidden risks behind that maintenance issue (e.g., this task is critical, so it should be tested more or additional resources are needed).In this paper, we present a novel code churn prediction model, that uses a mix of product, process, and developer-related factors never used in this context to output a nominal value indicating an estimate of the category of code churn for a given maintenance issue. We employ the model in a large-scale empirical study involving 17 open-source software systems, comparing it with baselines relying on (i) only product, (ii) only process, and (iii) a combination of product and process metrics. We show that the proposed model is pretty accurate in the estima- tions reaching up to 70% of F-Measure and 80% of AUC-ROC. Furthermore, its performance is statistically better than the one of other baseline models in 88% of the cases.

软件系统会随时间持续演化，这源于需求变更、代码重构或缺陷修复等活动。量化变更规模的指标是代码 churn（code churn），其定义为开发者执行该次变更时新增、修改及删除的代码行数总和。既往研究表明，从业者可利用代码 churn 开展缺陷密度、漏洞存在性的早期评估，或仅用于监控代码变更的影响范围。我们认为，若能通过自动化软件分析技术，告知开发者完成某项维护任务所需的代码 churn 规模，将有助于评估该维护任务的复杂度及其潜在隐藏风险（例如，该任务至关重要，需加强测试或投入更多资源）。本文提出一种全新的代码 churn 预测模型，该模型融合了此前未在该场景下使用过的产品、流程及开发者相关三类特征，可输出标称值，用于估算给定维护任务对应的代码 churn 类别。我们将该模型应用于一项涵盖17个开源软件系统的大规模实证研究中，并与三类基线模型进行对比：（i）仅使用产品特征的基线模型，（ii）仅使用流程特征的基线模型，以及（iii）同时使用产品与流程指标的基线模型。实验结果表明，所提模型的预测较为准确，F测度（F-Measure）最高可达70%，ROC曲线下面积（AUC-ROC）最高可达80%。此外，在88%的实验场景中，该模型的性能在统计学上显著优于其余基线模型。

创建时间：

2018-10-05

5,000+

优质数据集

54 个

任务类型

进入经典数据集