Readme files in 16,000,000 public GitHub repositories (October 2016)
收藏Mendeley Data2024-03-27 更新2024-06-29 收录
下载链接:
https://zenodo.org/record/285419
下载链接
链接失效反馈官方服务:
资源简介:
Format index.csv.gz - CSV comma separated file with 3 columns: <repository name>, <flag>,<readme file name> For example: src-d/go-git,s,README.md The flag is either "s" (readme found) or "r" (readme does not exist on the root directory level). Readme file name may be any from the list: "README.md", "readme.md", "Readme.md", "README.MD", "README.txt", "readme.txt", "Readme.txt", "README.TXT", "README", "readme", "Readme", "README.rst", "readme.rst", "Readme.rst", "README.RST" 100 part-r-00xxx files are in "new" Hadoop API format with the following settings: inputFormatClass is org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat keyClass is org.apache.hadoop.io.Text - repository name valueClass is org.apache.hadoop.io.BytesWritable - gzipped readme file
创建时间:
2023-06-28



