Reset23/the-stack-v2-new-cpp
收藏Hugging Face2025-04-07 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/Reset23/the-stack-v2-new-cpp
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了多个与代码仓库相关的字段,如blob_id、directory_id、path等,记录了代码仓库的具体信息,包括许可证类型、仓库名称、快照ID、修订ID、分支名称、访问日期、修订日期、提交者日期、GitHub ID、star事件数量、fork事件数量等。此外,还包含了代码仓库文件的语言、编码、作者信息等。数据集分为训练集,大小为11,996,061,874字节,共有1,000,000个样本。默认配置下,训练数据文件以data/train-*的形式存储。
The dataset comprises multiple fields related to code repositories, such as blob_id, directory_id, path, etc., documenting specific information about the code repositories, including license types, repository names, snapshot IDs, revision IDs, branch names, visit dates, revision dates, committer dates, GitHub IDs, star event counts, fork event counts, etc. It also includes details about the repository file languages, encodings, author information, and more. The dataset is split into a training set, which is 11,996,061,874 bytes in size and contains 1,000,000 samples. Under the default configuration, the training data files are stored in the form of data/train-*.
提供机构:
Reset23



