Gitome: A curated dataset for GitHub README-related tasks
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10295080
下载链接
链接失效反馈官方服务:
资源简介:
About
This repository contains the source code implementation used to replicate the experimental results obtained in the submitted to the 21st International Conference on Mining Software Repositories (MSR204).
"Gitome: A curated dataset for GitHub README-related tasks"
authored by:
Claudio Di Sipio, Juri Di Rocco, Riccardo Rubei, Phuong Than Nguyen, and Davide Di Ruscio,
Università degli Studi dell'Aquila, Italy
Data description
The dataset is structured as follows:
emf_metamodel.zip: It contains the Ecore project with the Gitome data model
existing_dumps.zip: It contains the existing datasets used to build Gitome
lang_aggr_stats.csv: It contains the language data to compute the statistics presented in the paper
langs.csv: It contains all the languages and their frequency
output_dataset.zip: It contains the benchmarking dataset obtained by parsing the README files
repository_lists.zip: It contains the list of repositories for each considered dataset (with possible duplicates)
topics.csv: It contains all the topics and their frequency
topics_aggr_stats.csv: It contains the topics data to compute the statistics presented in the paper
gitome_repo.txt: It contains the list of the URLs of the considered GitHub repositories
How to collect Gitome
To collect all the data stored in this archive, please refer to the supporting Github repository https://github.com/MDEGroup/Gitome-MSR2024.
创建时间:
2023-12-11



