droo303/wikitext_mlm_nsp
收藏Hugging Face2024-07-31 更新2024-07-06 收录
下载链接:
https://hf-mirror.com/datasets/droo303/wikitext_mlm_nsp
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是用于基础模型预处理的,灵感来自BERT预训练,包含下一句预测(NSP)和掩码语言模型(MLM)任务和标签。数据集的特征包括输入ID、注意力掩码、token类型ID、标签和下一句标签。数据集分为训练集和验证集,分别包含1715460和3674个样本。数据集的总下载大小为1076906240字节,总大小为12364011728字节。
This dataset is primarily used for natural language processing tasks, featuring multiple attributes such as input_ids, attention_mask, token_type_ids, labels, and next_sentence_labels, which represent the encoding of input sequences, attention masks, token type identifiers, labels, and next sentence labels, respectively. The dataset is divided into a training set and a validation set, containing 1,715,460 and 3,674 samples, respectively. The total download size of the dataset is 1,076,906,240 bytes, and the total size is 12,364,011,728 bytes. The dataset configuration is set to default, with the training and validation set data files stored in the paths data/train-* and data/validation-*, respectively.
提供机构:
droo303



