five

Data underlying the BSc project: "An analysis of Java release practices on GitHub"

收藏
DataCite Commons2024-01-29 更新2024-07-03 收录
下载链接:
https://data.4tu.nl/datasets/67a790fe-b65a-4c30-aae0-c5b2dc7e5d4d
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains the following inside a tar.zst file:A list of all Java repositories on GitHub in a CSV formatThe POM.xml file from those repositories if there was one at the root of the repoA sample of 500 000 repositories thatHave been searched recursively for POM.xml filesOf those that have a POM.xml file an 'effective' POM.xml has been createdOf those that have distribution repositories configured, GitHub workflow files if they exista report.json file that contains aggregate information of the sample<br>The scraper written to retrieve this data is also included.<br>This dataset was created for a Computer Science Bachelor Research Project titled "An analysis of Java release practices on GitHub" by Vivian Roest.

本数据集以tar.zst压缩包形式封装,内含以下内容: GitHub平台上全部Java仓库的逗号分隔值(Comma-Separated Values,CSV)格式列表; 各仓库根目录下的POM.xml(Project Object Model)文件(若存在); 从中抽取的50万个样本仓库,已对其完成递归搜索以查找POM.xml文件; 对于存在POM.xml文件的仓库,将生成其有效POM.xml(effective POM.xml); 对于配置了分发仓库的仓库,若其存在GitHub工作流文件则一并收录; 包含本次采样样本聚合统计信息的report.json文件; 本次数据采集所用的爬虫脚本也一并包含在内。 本数据集由Vivian Roest为题为《GitHub平台Java发布实践分析》的计算机科学本科科研项目所创建。
提供机构:
4TU.ResearchData
创建时间:
2024-01-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作