Readme files in 16,000,000 public GitHub repositories (October 2016)
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/records/285419
下载链接
链接失效反馈官方服务:
资源简介:
Format
index.csv.gz - CSV comma separated file with 3 columns: , , For example: src-d/go-git,s,README.md
The flag is either "s" (readme found) or "r" (readme does not exist on the root directory level). Readme file name may be any from the list:
"README.md", "readme.md", "Readme.md", "README.MD", "README.txt", "readme.txt", "Readme.txt", "README.TXT", "README", "readme", "Readme", "README.rst", "readme.rst", "Readme.rst", "README.RST"
100 part-r-00xxx files are in "new" Hadoop API format with the following settings:
inputFormatClass is org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
keyClass is org.apache.hadoop.io.Text - repository name
valueClass is org.apache.hadoop.io.BytesWritable - gzipped readme file
创建时间:
2020-01-24



