GitHub Repositories Dataset
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/jpmorganchase/topical
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了用于训练主题模型的公开可访问GitHub仓库,旨在生成库级别的嵌入向量。此外,该数据集涵盖了5个热门主题的仓库,这些仓库是直接从GitHub抓取的。该任务的具体目标是进行仓库的自动标签分类和归类。
This dataset comprises publicly accessible GitHub repositories intended for training topic models, with the aim of generating library-level embeddings. Furthermore, the dataset includes repositories spanning 5 popular topics, which were directly scraped from GitHub. The specific objective of this task is to perform automated label classification and categorization for repositories.
提供机构:
GitHub



