Dependabot and Security Pull Requests
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/7793971
下载链接
链接失效反馈官方服务:
资源简介:
This deposit contains four (4) main datasets that were used in the study "Dependabot and Security Pull Requests: Large Empirical Study" (Link) . Each dataset is described as follows :
Dataset (1) - Dependency Update : This dataset concerns issues related to pull requests (PRs) that were created by both users and bots to manage dependency updates in GitHub projects. The search was based on the keywords "Dependency, Update" in the title, body or comment of a PR created in the time period between 26/05/2017 and 15/06/2021 for the 1st partition, and between 01/01/2023 and 30/09/2023 for the 2nd partition. We obtained a total of 6,573,489 PR-related issues belonging to a total of 927,007 repositories for partition (1); and for partition (2), we obtained a total of 3,342,829 PR-related issues belonging to a total of 816,028 repositories.
Dataset (2) - Dependabot Security PRs : The second dataset is related to PRs created by Dependabot to handle security vulnerabilities in project dependencies. In our search, we look for PR-related issues created by "Dependabot-preview" or "Dependabot" and with the label "security", also created during the time period between 26/05/2017 and 30/09/2023. With these parameters, our results consist of 422,388 issues from 47,987 repositories.
Dataset (3) - Manual Security PRs : For this dataset, we were interested in PRs created only by users to handle security vulnerabilities. The search consists of finding the keywords "Dependency, Vulnerable" in the title, body or comment of a PR created in the time period between 26/05/2017 and 30/09/2023. We only consider pull requests created by authors with the type "user". The final results include a total of 186,186 issues for 60,758 repositories.
Dataset (4) - Bots' Security PRs : This dataset is related to PRs created by several bots to handle security vulnerabilities in project dependencies. In the search query, we look for PR-related issues where the keywords "Dependency", and "Security", and "Vulnerability" are mentioned in the title, body, or comment of the PR. These PRs are created by one of the following bots: "Snyk", "Renovate", "Greenkeeper", or "Depfu", also created during the time period between 26/05/2017 and 30/09/2023. The obtained results for the 4 bots consists of a collection of 628,495 PR-related issues in a total of 105,342 repositories.
We also included :
Derived Sample : This sample contains the data that was selected and extracted to conduct our manual qualitative analysis, and the manual feature extraction.
创建时间:
2024-08-04



