4rtemi5/processed_tsb_bert_dataset
收藏Hugging Face2025-01-09 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/4rtemi5/processed_tsb_bert_dataset
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了用于训练的文本数据,具体特征包括输入ID序列、token类型ID序列、注意力掩码序列和特殊token掩码序列。数据集分为训练集,共有约667万4千多条示例,总大小约为23.02GB。数据集提供了一个默认配置,其中包含了训练集的数据文件路径。
The dataset consists of text data for training, including features such as input ID sequences, token type ID sequences, attention mask sequences, and special token mask sequences. The dataset is split into a training set with approximately 6,674,363 examples, totaling about 23.02GB in size. The dataset provides a default configuration, which includes the data file path for the training set.
提供机构:
4rtemi5



