pccl-org/Skylion007-openwebtext-tokenizer-gpt2-64
收藏Hugging Face2024-07-13 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/pccl-org/Skylion007-openwebtext-tokenizer-gpt2-64
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含一个名为train的分割,共有141,185,472个样本,总大小为36,708,222,720字节。数据集的特征包括一个名为input_ids的序列,其数据类型为int32。数据集的下载大小为23,059,760,475字节。
The dataset includes a split named train with a total of 141,185,472 examples and a size of 36,708,222,720 bytes. The features of the dataset include a sequence named input_ids with the data type int32. The download size of the dataset is 23,059,760,475 bytes.
提供机构:
pccl-org



