five

GDED (GitDelver Enterprise Dataset)

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/5838536
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset was produced by analyzing the Git commits from 101 closed-source software repositories with the GitDelver mining tool (Zenodo, GitHub). It is made of three CSV files. commits_history.csv shows data about 106,216 Git commits performed by 164 developers. It has the following columns: Repository: the name of the repository. Branches: the list of branches in which this modification has been integrated (works best if you target a bare repository). NbBranches: the number of branches in which this modification has been integrated (works best if you target a bare repository). CommitId: the identifier of the commit. Author: the author of the modification. DateTime: the date and time of the modification. Date: the date of the modification. HourOfDay: the hour of the day at which the modification took place. Merge: flag telling if the commit is a merge commit. BugFix: flag telling if the modification is a bugfix. SATD: flag telling if the modification contains Self-Admitted Technical Debt. NbModifiedFiles: the total number of files (supported and unsupported) modified by this commit. ModifiedFiles: the list of files modified by this commit. NbModifiedProdSourceFiles: the number of production source files modified by this commit. NbModifiedTestSourceFiles: the number of test source files modified by this commit. NbModifications: the total number of modifications done by this commit. NbInsertions: the number of insertions done by the commit. NbDeletions: the number of deletions done by the commit. files_history.csv shows data about 470,940 file modifications performed by 153 developers. It has the following columns: Repository: the name of the repository. Branches: the list of branches in which this modification has been integrated (works best if you target a bare repository). NbBranches: the number of branches in which this modification has been integrated (works best if you target a bare repository). OldFilePath: the old relative path to the file. FilePath: the relative path to the file. FileName: the name of the file. FileExtension: the file extension. FileType: the type of the file ("Production" or "Test"). ChangeType: the type of the change ("ADD", "COPY", "RENAME", "DELETE", "MODIFY" or "UNKNOWN"). NbMethods: the number of methods in the file. NbMethodsChanged: the number of methods that have been modified in this file for this commit. NLOC: the number of lines of code of the file. Complexity: the Weighted Methods per Class complexity, i.e., the sum of the cyclomatic complexity numbers of all the methods of the file. NlocDivByNbMethods: the number of lines of code of the file divided by the number of methods of the file. ComplexDivByNbMethods: the complexity of the file divided by the number of methods of the file. SATD: flag telling if the modification contains Self-Admitted Technical Debt. NbLinesAdded: the number of lines added. NbLinesDeleted: the number of lines deleted. CommitId: the identifier of the commit. Author: the author of the modification. DateTime: the date and time of the modification. Date: the date of the modification. HourOfDay: the hour of the day at which the modification took place. methods_history.csv shows data about 3,471,556 method modifications performed by 153 developers. It has the following columns: Repository: the name of the repository. Branches: the list of branches in which this modification has been integrated (works best if you target a bare repository). NbBranches: the number of branches in which this modification has been integrated (works best if you target a bare repository). OldFilePath: the old relative path to the file. FilePath: the relative path to the file. FileName: the name of the file. FileType: the type of the file ("Production" or "Test"). MethodName: the name of the method. NbParams: the number of parameters in the method signature. NLOC: the number of lines of code of the method. Complexity: the cyclomatic complexity number of the method. CommitId: the identifier of the commit. Author: the author of the modification. DateTime: the date and time of the modification. Date: the date of the modification. HourOfDay: the hour of the day at which the modification took place. Remarks: All of the data come from a single organization in the industry. It wishes to remain anonymous. files_history.csv references commits_history.csv and methods_history.csv references files_history.csv and commits_history.csv. For privacy (GDPR) and security reasons, several columns have been anonymized or have had their contents scrambled.
创建时间:
2022-01-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作