five

452,000,000 public Git commits on GitHub (October 2016)

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/records/285467
下载链接
链接失效反馈
官方服务:
资源简介:
What's inside part-000xx.lzo - LZO archives with the data (refer to "Format"). part-000xx.lzo.index - LZO index files so that the archives are splittable in Hadoop. stats.csv.gz - GZIP-ed CSV file with some repository statistics related to the commits. Format part-000xx - text, one line per repository, every line is JSON with the following scheme: { "r": "repository name", "c": [{ "h": "git hash", "a": "author's email hash", "t": "date and time commit was created", "m": "commit message" }, ...] } Date and time format is mostly Go language's time.Time.String(), I recommend to use dateutil.parse() to parse it with Python. Commit message contains explicit \r and \n symbols in order to be a single line. stats.csv has 4 columns: repository name, number of commits, number of contributors, average length of the commit messages.
创建时间:
2020-01-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作