gokulsrinivasagan/processed_book_corpus-ld-20
收藏Hugging Face2024-12-05 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/gokulsrinivasagan/processed_book_corpus-ld-20
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含用于自然语言处理任务的特征,如input_ids、attention_mask、special_tokens_mask和lda_lables。数据集分为训练集和验证集,训练集包含2,277,342个样本,验证集包含120,706个样本。总下载大小为2,528,415,605字节,数据集总大小为7,788,859,904字节。数据文件路径分别为data/train-*和data/validation-*。
This dataset includes features for natural language processing tasks, such as input_ids, attention_mask, special_tokens_mask, and lda_lables. The dataset is divided into a training set and a validation set, with the training set containing 2,277,342 samples and the validation set containing 120,706 samples. The total download size is 2,528,415,605 bytes, and the total dataset size is 7,788,859,904 bytes. The data file paths are data/train-* and data/validation-*.
提供机构:
gokulsrinivasagan



