Araods/tokenized-data

Name: Araods/tokenized-data
Creator: Araods
Published: 2025-02-18 19:19:48
License: 暂无描述

Hugging Face2025-02-18 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/Araods/tokenized-data

下载链接

链接失效反馈

官方服务：

资源简介：

这是一个包含source和target文本对以及相关输入特征的数据集，适用于自然语言处理任务。它由训练集和测试集组成，训练集包含约1093302个样本，测试集包含约273326个样本。数据集特征包括字符串类型的source和target字段，以及用于模型输入的int32类型的input_ids序列和int8类型的attention_mask序列。此外，数据集还提供了int64类型的labels标签字段。数据集的总下载大小为381993159字节，解压后的总大小为3921730342字节。

This dataset consists of source and target text pairs along with related input features suitable for natural language processing tasks. It is divided into a training set with approximately 1,093,302 samples and a test set with about 273,326 samples. The dataset features include string-type source and target fields, as well as int32-type input_ids sequence and int8-type attention_mask sequence for model input. Additionally, the dataset provides an int64-type labels field. The total download size of the dataset is 381,993,159 bytes, and the total size after decompression is 3,921,730,342 bytes.

提供机构：

Araods

5,000+

优质数据集

54 个

任务类型

进入经典数据集