MLCQProjects: Towards evolving data set of industry-relevant software projects
收藏Mendeley Data2024-03-27 更新2024-06-29 收录
下载链接:
https://zenodo.org/record/3666458
下载链接
链接失效反馈官方服务:
资源简介:
Short: This data set contains three snapshots from an evolving software project data set and a script that generates those snapshots. Projects in snapshots are annotated with features used to assess industry-relevance - documentation links, installation means, support channels and whether the project delivers functionalities or samples. Context: Researchers involved in mining software repositories face a challenge that many of existing data sets reflect how software was developed in the past (typically many years ago), instead of in the present. Another challenge is that data sets based on open-source software projects include projects that are not necessarily industry-relevant. Aim: The aim of this paper is to address the aforementioned challenges and provide both: 1) a snapshot-based evolving data set of software projects (thus reflecting their present, as well as previous states), manually enriched with data considered important for industrial relevance assessment, and 2) a method of assessing industrial relevance of software projects. Method: We present a systematic method of selecting data sets of software projects for the purposes of mining software repositories of potentially industry-relevant projects and a semi-systematic method of assessing the industrial relevance of those projects. Data set: The data set contains three snapshots (spanning over 10 months) of popular Java projects from GitHub, manually enriched with industrial relevance and maintenance-related information. The presented acquisition method is sufficient to generate further snapshots and one should be able to assess the industrial relevance of the projects using the presented assessment method. Provided data set open directions of further research, e.g, 1) evaluation of code smells or defect prediction models on industry-relevant software projects prepared by independent authors and not used to build models, 2) analysis of some social aspects of open source projects (such as tooling used for providing support) and how they evolve in time.
创建时间:
2023-06-28



