gokulsrinivasagan/processed_book_corpus_cleaned
收藏Hugging Face2024-12-04 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/gokulsrinivasagan/processed_book_corpus_cleaned
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含训练和验证两个分割,训练集包含2,277,342个样本,验证集包含120,706个样本。数据集的特征包括input_ids、attention_mask和special_tokens_mask,分别表示输入ID序列、注意力掩码序列和特殊标记掩码序列。数据集的下载大小为2,053,364,016字节,总大小为7,395,532,692.332596字节。
The dataset includes two splits: train and validation. The training set contains 2,277,342 examples, and the validation set contains 120,706 examples. The features of the dataset include input_ids, attention_mask, and special_tokens_mask, representing input ID sequences, attention mask sequences, and special token mask sequences, respectively. The download size of the dataset is 2,053,364,016 bytes, and the total size is 7,395,532,692.332596 bytes.
提供机构:
gokulsrinivasagan



