359,569 commits with source code density; 1149 commits of which have software maintenance activity labels (adaptive, corrective, perfective)
收藏Mendeley Data2024-03-27 更新2024-06-30 收录
下载链接:
https://zenodo.org/record/2590519
下载链接
链接失效反馈官方服务:
资源简介:
This dataset comes as SQL-importable file and is compatible with the widely available MariaDB- and MySQL-databases. It is based on (and incorporates/extends) the dataset "1151 commits with software maintenance activity labels (corrective,perfective,adaptive)" by Levin and Yehudai (https://doi.org/10.5281/zenodo.835534). The extensions to this dataset were obtained using Git-Tools, a tool that is included in the Git-Density (https://doi.org/10.5281/zenodo.2565238) suite. For each of the projects in the original dataset, Git-Tools was run in extended mode. The dataset contains these tables: x1151: The original dataset from Levin and Yehudai. despite its name, this dataset has only 1,149 commits, as two commits were duplicates in the original dataset. This dataset spanned 11 projects, each of which had between 99 and 114 commits This dataset has 71 features and spans the projects RxJava, hbase, elasticsearch, intellij-community, hadoop, drools, Kotlin, restlet-framework-java, orientdb, camel and spring-framework. gtools_ex (short for Git-Tools, extended) Contains 359,569 commits, analyzed using Git-Tools in extended mode It spans all commits and projects from the x1151 dataset as well. All 11 projects were analyzed, from the initial commit until the end of January 2019. For the projects Intellij and Kotlin, the first 35,000 resp. 30,000 commits were analyzed. This dataset introduces 35 new features (see list below), 22 of which are size- or density-related. The dataset contains these views: geX_L (short for Git-tools, extended, with labels) Joins the commits' labels from x1151 with the extended attributes from gtools_ex, using the commits' hashes. jeX_L (short for joined, extended, with labels) Joins the datasets x1151 and gtools_ex entirely, based on the commits' hashes. Features of the gtools_ex dataset: SHA1 RepoPathOrUrl AuthorName CommitterName AuthorTime (UTC) CommitterTime (UTC) MinutesSincePreviousCommit: Double, describing the amount of minutes that passed since the previous commit. Previous refers to the parent commit, not the previous in time. Message: The commit's message/comment AuthorEmail CommitterEmail AuthorNominalLabel: All authors of a repository are analyzed and merged by Git-Density using some heuristic, even if they do not always use the same email address or name. This label is a unique string that helps identifying the same author across commits, even if the author did not always use the exact same identity. CommitterNominalLabel: The same as AuthorNominalLabel, but for the committer this time. IsInitialCommit: A boolean indicating, whether a commit is preceded by a parent or not. IsMergeCommit: A boolean indicating whether a commit has more than one parent. NumberOfParentCommits ParentCommitSHA1s: A comma-concatenated string of the parents' SHA1 IDs NumberOfFilesAdded NumberOfFilesAddedNet: Like the previous property, but if the net-size of all changes of an added file is zero (i.e. when adding a file that is empty/whitespace or does not contain code), then this property does not count the file. NumberOfLinesAddedByAddedFiles NumberOfLinesAddedByAddedFilesNet: Like the previous property, but counts the net-lines NumberOfFilesDeleted NumberOfFilesDeletedNet: Like the previous property, but considers only files that had net-changes NumberOfLinesDeletedByDeletedFiles NumberOfLinesDeletedByDeletedFilesNet: Like the previous property, but counts the net-lines NumberOfFilesModified NumberOfFilesModifiedNet: Like the previous property, but considers only files that had net-changes NumberOfFilesRenamed NumberOfFilesRenamedNet: Like the previous property, but considers only files that had net-changes NumberOfLinesAddedByModifiedFiles NumberOfLinesAddedByModifiedFilesNet: Like the previous property, but counts the net-lines NumberOfLinesDeletedByModifiedFiles NumberOfLinesDeletedByModifiedFilesNet: Like the previous property, but counts the net-lines NumberOfLinesAddedByRenamedFiles NumberOfLinesAddedByRenamedFilesNet: Like the previous property, but counts the net-lines NumberOfLinesDeletedByRenamedFiles NumberOfLinesDeletedByRenamedFilesNet: Like the previous property, but counts the net-lines Density: The ratio between the two sums of all lines added+deleted+modified+renamed and their resp. gross-version. A density of zero means that the sum of net-lines is zero (i.e. all lines changes were just whitespace, comments etc.). A density of of 1 means that all changed net-lines contribute to the gross-size of the commit (i.e. no useless lines with e.g. only comments or whitespace). AffectedFilesRatioNet: The ratio between the sums of NumberOfFilesXXX and NumberOfFilesXXXNet This dataset is supporting the paper "Importance and Aptitude of Source code Density for Commit Classification into Maintenance Activities", as submitted to the QRS2019 conference (The 19th IEEE International Conference on Software Quality, Reliability, and Security). Citation: Hönel, S., Ericsson, M., Löwe, W. and Wingkvist, A., 2019. Importance and Aptitude of Source code Density for Commit Classification into Maintenance Activities. In The 19th IEEE International Conference on Software Quality, Reliability, and Security.
创建时间:
2023-06-28



