Araods/tokenized-data
收藏Hugging Face2025-02-18 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Araods/tokenized-data
下载链接
链接失效反馈官方服务:
资源简介:
这是一个包含source和target文本对以及相关输入特征的数据集,适用于自然语言处理任务。它由训练集和测试集组成,训练集包含约1093302个样本,测试集包含约273326个样本。数据集特征包括字符串类型的source和target字段,以及用于模型输入的int32类型的input_ids序列和int8类型的attention_mask序列。此外,数据集还提供了int64类型的labels标签字段。数据集的总下载大小为381993159字节,解压后的总大小为3921730342字节。
This dataset consists of source and target text pairs along with related input features suitable for natural language processing tasks. It is divided into a training set with approximately 1,093,302 samples and a test set with about 273,326 samples. The dataset features include string-type source and target fields, as well as int32-type input_ids sequence and int8-type attention_mask sequence for model input. Additionally, the dataset provides an int64-type labels field. The total download size of the dataset is 381,993,159 bytes, and the total size after decompression is 3,921,730,342 bytes.
提供机构:
Araods



