five

Studying the Impact of Noises in Build Breakage Data [Online Appendix]

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/3401554
下载链接
链接失效反馈
官方服务:
资源简介:
This document represents an Online Appendix of our article, which has been accepted for publication in the IEEE Transactions on Software Engineering (TSE) in August 2019. The Online Appendix includes a replication package of the scripts and data we used in our work, in addition to more detailed results of the findings reported in the article.   Abstract. Much research has investigated the common reasons for build breakages. However, prior research has paid little attention to builds that may break due to reasons that are unlikely to be related to development activities. For example, Continuous Integration (CI) builds may break due to timeout or connection errors while generating the build. Such kinds of build breakages potentially introduce noises to build breakage data. Not considering such noises may lead to misleading results when studying CI builds. In this paper, we propose three criteria to identify build breakages that can potentially introduce noises to build breakage data. We apply these criteria to a dataset of 350,246 builds from 153 GitHub projects that are linked with Travis CI. Our results reveal that 33% of the build breakages are due to environmental factors (e.g., errors in CI servers), 29% are due to (unfixed) errors in previous builds, and 9% are due to build jobs that were later deemed by developers as noisy (there is an overlap of 17% between these three types of breakages). We measure the impact of noises in build breakage data on modeling build breakages. We observe that models that use uncleaned build breakage data can lead to misleading associations between build breakages and development activities (e.g., the role of developer). However, such associations could not be observed after eliminating noisy build breakages. Moreover, we replicate a prior study that investigates the association between build breakages and development activities using data from 14 GitHub projects. We observe that some observations reported by the prior study (e.g., pull requests cause more breakages) do not hold after eliminating the noises from build breakage data.   Citing this research. If you intend to use any materials (i.e., scripts, data, approach, or findings) of this work, please cite our article as follows: @article{ghaleb2019studying,       title={Studying the Impact of Noises in Build Breakage Data},       author={Ghaleb, Taher Ahmed and da Costa, Daniel Alencar and Zou, Ying and Hassan, Ahmed E.},       journal={IEEE Transactions on Software Engineering},       pages={1--14},       year={2019},       publisher={IEEE},       DOI={10.1109/TSE.2019.2941880} } The official publication of this research can be found at https://dx.doi.org/10.1109/TSE.2019.2941880   Replication Package. You may download the whole replication package (i.e., scripts, data, approach, or findings) using the zip file shown below. Please note that the size of the raw build logs used in this work is approximately 107 GB. You can download raw build logs using from Travis CI (using Travis API) or its AWS S3 backend (using Amazon S3 API).
创建时间:
2020-01-24
二维码
社区交流群
二维码
科研交流群
商业服务