ZhentingNLP/code_pretraining_data

Name: ZhentingNLP/code_pretraining_data
Creator: ZhentingNLP
Published: 2025-03-12 04:57:17
License: 暂无描述

Hugging Face2025-03-12 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/ZhentingNLP/code_pretraining_data

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含了文本内容以及与LLaMA-2模型相关的token数量和文本来源信息。数据集被划分为训练集，共有26,680,129个示例，大小约为37.27GB。数据集的下载大小约为16.99GB。

The dataset includes text content along with the number of LLaMA-2 model tokens and the source of the text. The dataset is split into a training set, which contains 26,680,129 examples and is approximately 37.27GB in size. The download size of the dataset is about 16.99GB.

提供机构：

ZhentingNLP

5,000+

优质数据集

54 个

任务类型

进入经典数据集