A Comprehensive Technique to Predict the Size of Maintenance Tasks

Name: A Comprehensive Technique to Predict the Size of Maintenance Tasks
Creator: figshare
Published: 2020-08-31 04:36:40
License: 暂无描述

DataCite Commons2020-08-31 更新2024-07-27 收录

下载链接：

https://figshare.com/articles/A_Holistic_Developer-Oriented_Model_for_Estimating_the_Effort_of_Maintenance_Tasks/5726854/2

下载链接

链接失效反馈

官方服务：

资源简介：

Software systems continuously evolve over time because of changes in the requirements, code refactoring, or bug fixing activities. A way to quantify the extent of a change is given by code churn, that represent the number of lines of code changed by a developer to perform such a change. Previous research showed that code churn can be adopted by practitioners to perform early evaluation of defect density, presence of vulnerabilities, or to simply monitor the impact of a code change. We argue that an automated software analytics technique able to inform developers of the quantity of code needed to perform a maintenance task might be useful when estimating the likely effort needed to issue it or assessing the possible hidden risks. In this paper, we present a novel code churn prediction model, that uses a mix of product, process, and developer-related factors to output a nominal value indicating an estimate of the code churn for a given maintenance task. We employ the model in a large-scale empirical study involving 17 open-source software sys- tems, comparing it with baselines relying on (i) only product, (ii) only pro- cess, and (iii) a combination of product and process metrics. We show that the proposed model is pretty accurate in the estimations reaching up to 70% of F-Measure and 80% of AUC-ROC. Furthermore, it is statistically better than other baseline models in 88% of the cases.

软件系统会因需求变更、代码重构或缺陷修复活动随时间持续演化。量化变更程度的一种指标为代码变更量（code churn），即开发者执行某次变更时所修改的代码行数。既往研究显示，行业从业者可借助代码变更量开展缺陷密度的早期评估、漏洞存在情况排查，或仅用于监控代码变更的影响。我们认为，借助自动化软件分析技术向开发者告知执行某项维护任务所需的代码工作量，将有助于估算该任务的大致工作量，或评估潜在的隐性风险。本文提出一种新型代码变更量（code churn）预测模型，该模型融合产品、流程及开发者相关三类因素，输出标称值以表征指定维护任务的代码变更量估算结果。我们在涵盖17个开源软件系统的大规模实证研究中应用该模型，并与三类基准模型展开对比：（i）仅基于产品指标的基准模型、（ii）仅基于流程指标的基准模型，以及（iii）结合产品与流程指标的基准模型。实验结果表明，所提模型的估算精度可观，F测度最高可达70%，受试者工作特征曲线下面积（AUC-ROC）最高可达80%。此外，在88%的测试场景中，该模型在统计学意义上优于其他基准模型。

提供机构：

figshare

创建时间：

2018-10-02

5,000+

优质数据集

54 个

任务类型

进入经典数据集