introspector/meta-introspector
收藏Hugging Face2026-01-18 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/introspector/meta-introspector
下载链接
链接失效反馈官方服务:
资源简介:
meta-introspector数据集提供了一个全面的索引,包含366万个文件,并带有完整的git来源跟踪。该数据集支持:跨数百万文件的代码搜索、从文件到git提交的来源追踪、大规模仓库分析、以及基于真实世界代码数据的机器学习训练。数据集结构包括每个文件的路径、git仓库根路径、提交哈希、分支名称、远程URL、GitHub/GitLab文件URL以及是否被git跟踪的状态。数据集包含366万多个文件,其中64.7%的文件有git仓库信息,文件大小为437MB(Parquet格式,压缩后)。数据集适用于代码搜索、来源追踪、仓库分析、机器学习训练和研究等多种用途。
The meta-introspector dataset provides a comprehensive index of 3.66 million files with full git provenance tracking. This dataset enables: code search across millions of files, provenance tracking from files to git commits, repository analysis at scale, and ML training on real-world code data. The dataset structure includes each files path, git repository root path, commit hash, branch name, remote URL, GitHub/GitLab file URL, and whether it is tracked in git. The dataset contains 3,660,152 files, with 64.7% of files having git repository information, and the file size is 437MB (Parquet format, compressed). The dataset is suitable for various uses such as code search, provenance tracking, repository analysis, ML training, and research.
提供机构:
introspector



