p1atdev/202408-at20240906-tokenized-shuffle-241015-384ktokens
收藏Hugging Face2024-10-15 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/p1atdev/202408-at20240906-tokenized-shuffle-241015-384ktokens
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含两个主要部分:训练集和测试集。训练集包含7,635,898个样本,占用约928.87MB的空间;测试集包含10,000个样本,占用约1.22MB的空间。数据集的总大小约为930.09MB,下载大小约为524.23MB。数据集中包含一个名为input_ids的特征,它是一个int32类型的序列。
The dataset consists of two main parts: a training set and a test set. The training set contains 7,635,898 samples, occupying approximately 928.87MB of space; the test set contains 10,000 samples, occupying approximately 1.22MB of space. The total size of the dataset is approximately 930.09MB, with a download size of approximately 524.23MB. The dataset includes a feature named input_ids, which is a sequence of int32 type.
提供机构:
p1atdev



