ikkiren/big_tokenized_dataset_half
收藏Hugging Face2024-10-29 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/ikkiren/big_tokenized_dataset_half
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含两个主要分割:训练集和测试集。训练集包含2,226,862个样本,占用约4,460,075,153.72字节;测试集包含247,430个样本,占用约495,565,686.28字节。数据集总大小为4,955,640,840字节,下载大小为1,431,016,005字节。数据特征包括名为input_ids的序列,其数据类型为int64。
This dataset includes two main splits: train and test. The train split contains 2,226,862 examples, occupying approximately 4,460,075,153.72 bytes; the test split contains 247,430 examples, occupying approximately 495,565,686.28 bytes. The total size of the dataset is 4,955,640,840 bytes, with a download size of 1,431,016,005 bytes. The features include a sequence named input_ids with a data type of int64.
提供机构:
ikkiren



