GDED (GitDelver Enterprise Dataset)
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/5838536
下载链接
链接失效反馈官方服务:
资源简介:
This dataset was produced by analyzing the Git commits from 101 closed-source software repositories with the GitDelver mining tool (Zenodo, GitHub).
It is made of three CSV files.
commits_history.csv shows data about 106,216 Git commits performed by 164 developers. It has the following columns:
Repository: the name of the repository.
Branches: the list of branches in which this modification has been integrated (works best if you target a bare repository).
NbBranches: the number of branches in which this modification has been integrated (works best if you target a bare repository).
CommitId: the identifier of the commit.
Author: the author of the modification.
DateTime: the date and time of the modification.
Date: the date of the modification.
HourOfDay: the hour of the day at which the modification took place.
Merge: flag telling if the commit is a merge commit.
BugFix: flag telling if the modification is a bugfix.
SATD: flag telling if the modification contains Self-Admitted Technical Debt.
NbModifiedFiles: the total number of files (supported and unsupported) modified by this commit.
ModifiedFiles: the list of files modified by this commit.
NbModifiedProdSourceFiles: the number of production source files modified by this commit.
NbModifiedTestSourceFiles: the number of test source files modified by this commit.
NbModifications: the total number of modifications done by this commit.
NbInsertions: the number of insertions done by the commit.
NbDeletions: the number of deletions done by the commit.
files_history.csv shows data about 470,940 file modifications performed by 153 developers. It has the following columns:
Repository: the name of the repository.
Branches: the list of branches in which this modification has been integrated (works best if you target a bare repository).
NbBranches: the number of branches in which this modification has been integrated (works best if you target a bare repository).
OldFilePath: the old relative path to the file.
FilePath: the relative path to the file.
FileName: the name of the file.
FileExtension: the file extension.
FileType: the type of the file ("Production" or "Test").
ChangeType: the type of the change ("ADD", "COPY", "RENAME", "DELETE", "MODIFY" or "UNKNOWN").
NbMethods: the number of methods in the file.
NbMethodsChanged: the number of methods that have been modified in this file for this commit.
NLOC: the number of lines of code of the file.
Complexity: the Weighted Methods per Class complexity, i.e., the sum of the cyclomatic complexity numbers of all the methods of the file.
NlocDivByNbMethods: the number of lines of code of the file divided by the number of methods of the file.
ComplexDivByNbMethods: the complexity of the file divided by the number of methods of the file.
SATD: flag telling if the modification contains Self-Admitted Technical Debt.
NbLinesAdded: the number of lines added.
NbLinesDeleted: the number of lines deleted.
CommitId: the identifier of the commit.
Author: the author of the modification.
DateTime: the date and time of the modification.
Date: the date of the modification.
HourOfDay: the hour of the day at which the modification took place.
methods_history.csv shows data about 3,471,556 method modifications performed by 153 developers. It has the following columns:
Repository: the name of the repository.
Branches: the list of branches in which this modification has been integrated (works best if you target a bare repository).
NbBranches: the number of branches in which this modification has been integrated (works best if you target a bare repository).
OldFilePath: the old relative path to the file.
FilePath: the relative path to the file.
FileName: the name of the file.
FileType: the type of the file ("Production" or "Test").
MethodName: the name of the method.
NbParams: the number of parameters in the method signature.
NLOC: the number of lines of code of the method.
Complexity: the cyclomatic complexity number of the method.
CommitId: the identifier of the commit.
Author: the author of the modification.
DateTime: the date and time of the modification.
Date: the date of the modification.
HourOfDay: the hour of the day at which the modification took place.
Remarks:
All of the data come from a single organization in the industry. It wishes to remain anonymous.
files_history.csv references commits_history.csv and methods_history.csv references files_history.csv and commits_history.csv.
For privacy (GDPR) and security reasons, several columns have been anonymized or have had their contents scrambled.
创建时间:
2022-01-14



