tommyp111/gg-fineweb-mix-tokenized-gemma2-2b
收藏Hugging Face2024-11-01 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/tommyp111/gg-fineweb-mix-tokenized-gemma2-2b
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含两个主要特征:tokens和source。tokens是一个int64类型的序列,而source是字符串类型。数据集被分为一个训练集,包含1,322,152个例子,总大小为10,850,900,084字节。下载大小为2,663,527,576字节。默认配置下的数据文件路径为data/train-*。
The dataset includes two main features: tokens and source. tokens is a sequence of int64, and source is of string type. The dataset is divided into a training set containing 1,322,152 examples, with a total size of 10,850,900,084 bytes. The download size is 2,663,527,576 bytes. The data file path under the default configuration is data/train-*.
提供机构:
tommyp111



