five

10 Years Bug-Fix Dataset (PROMISE'19)

收藏
DataCite Commons2021-09-27 更新2024-07-27 收录
下载链接:
https://figshare.com/articles/dataset/Replication_Package_-_PROMISE_19/8852084
下载链接
链接失效反馈
官方服务:
资源简介:
Replication Package of the paper "From Reports to Bug-Fix Commits: A 10 Years Dataset of Bug-Fixing Activity from 55 Apache's Open Source Projects"<br><br><b>ABSTRACT:</b><br>Bugs appear in almost any software development. Solving all or at least a large part of them requires a great deal of time, effort, and budget. Software projects typically use issue tracking systems as a way to report and monitor bug-fixing tasks. In recent years, several researchers have been conducting bug tracking analysis to better understand the problem and thus provide means to reduce costs and improve the efficiency of the bug-fixing task. In this paper, we introduce a new dataset composed of more than 70,000 bug-fix reports from 10 years of bug-fixing activity of 55 projects from the Apache Software Foundation, distributed in 9 categories. We have mined this information from Jira issue track system concerning two different perspectives of reports with closed/resolved status: static (the latest version of reports) and dynamic (the changes that have occurred in reports over time). We also extract information from the commits (if they exist) that fix such bugs from their respective version-control system (Git).We also provide a change analysis that occurs in the reports as a way of illustrating and characterizing the proposed dataset. Once the data extraction process is an error-prone nontrivial task, we believe such initiatives like this could be useful to support researchers in further more detailed investigations.<br><br>You can find the full paper at: https://doi.org/10.1145/3345629.3345639<br>If you use this dataset for your research, please reference the following paper:<pre><br>@inproceedings{Vieira:2019:RBC:3345629.3345639, author = {Vieira, Renan and da Silva, Ant\^{o}nio and Rocha, Lincoln and Gomes, Jo\~{a}o Paulo}, title = {From Reports to Bug-Fix Commits: A 10 Years Dataset of Bug-Fixing Activity from 55 Apache's Open Source Projects}, booktitle = {Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering}, series = {PROMISE'19}, year = {2019}, isbn = {978-1-4503-7233-6}, location = {Recife, Brazil}, pages = {80--89}, numpages = {10}, url = {http://doi.acm.org/10.1145/3345629.3345639}, doi = {10.1145/3345629.3345639}, acmid = {3345639}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {Bug-Fix Dataset, Mining Software Repositories, Software Traceability}, } <br><br>P.S: We added a new dataset version (v1.0.1). In this version, we fix the git commit features that track the src and test files. More info can be found in the fix-script.py file. </pre>

论文《从报告到缺陷修复提交:来自55个Apache开源项目的10年缺陷修复活动数据集》复现包 摘要: 几乎所有软件开发过程中均会出现软件缺陷。解决全部或绝大多数缺陷需耗费大量时间、人力与预算。软件项目通常借助问题跟踪系统上报并监控缺陷修复任务。近年来,诸多研究者开展缺陷跟踪分析研究,以更好地理解该问题,进而提出降低成本、提升缺陷修复任务效率的解决方案。本文提出了一个全新的数据集,该数据集源自Apache软件基金会(Apache Software Foundation)旗下55个项目10年的缺陷修复活动,包含超过7万份缺陷修复报告,分为9个类别。我们从Jira问题跟踪系统(Jira Issue Tracking System)中挖掘了两类状态为已关闭/已解决的报告信息:静态信息(报告的最新版本)与动态信息(报告随时间推移发生的变更)。此外,我们还从对应的版本控制系统(Git)中提取了修复上述缺陷的提交信息(若存在)。我们还提供了针对报告变更的分析内容,用以阐释并表征本数据集。由于数据提取过程是一项易出错且复杂的任务,我们认为此类数据集可为研究者开展更细致的后续研究提供支撑。 您可通过以下链接获取完整论文:https://doi.org/10.1145/3345629.3345639 若您将本数据集用于研究工作,请引用以下文献: @inproceedings{Vieira:2019:RBC:3345629.3345639, author = {Vieira, Renan and da Silva, Antônio and Rocha, Lincoln and Gomes, João Paulo}, title = {从报告到缺陷修复提交:来自55个Apache开源项目的10年缺陷修复活动数据集}, booktitle = {第十五届软件工程项目预测模型与数据分析国际会议论文集}, series = {PROMISE'19}, year = {2019}, isbn = {978-1-4503-7233-6}, location = {巴西累西腓}, pages = {80--89}, numpages = {10}, url = {http://doi.acm.org/10.1145/3345629.3345639}, doi = {10.1145/3345629.3345639}, acmid = {3345639}, publisher = {ACM(美国计算机协会)}, address = {美国纽约州纽约市}, keywords = {缺陷修复数据集(Bug-Fix Dataset)、软件仓库挖掘(Mining Software Repositories)、软件可追溯性(Software Traceability)} 附言:我们已新增数据集版本v1.0.1。该版本修复了用于追踪源代码与测试文件的Git提交特征相关问题。更多详情可查阅fix-script.py文件。
提供机构:
figshare
创建时间:
2019-07-10
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集是一个包含55个Apache开源项目10年缺陷修复活动的数据集,涵盖了70,000多个缺陷修复报告,并关联了Git中的修复提交信息。数据集提供了静态和动态的报告视角,支持研究人员进行软件缺陷修复的深入分析。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作