five

Araods/tokenized-data

收藏
Hugging Face2025-02-18 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Araods/tokenized-data
下载链接
链接失效反馈
官方服务:
资源简介:
这是一个包含source和target文本对以及相关输入特征的数据集,适用于自然语言处理任务。它由训练集和测试集组成,训练集包含约1093302个样本,测试集包含约273326个样本。数据集特征包括字符串类型的source和target字段,以及用于模型输入的int32类型的input_ids序列和int8类型的attention_mask序列。此外,数据集还提供了int64类型的labels标签字段。数据集的总下载大小为381993159字节,解压后的总大小为3921730342字节。

This dataset consists of source and target text pairs along with related input features suitable for natural language processing tasks. It is divided into a training set with approximately 1,093,302 samples and a test set with about 273,326 samples. The dataset features include string-type source and target fields, as well as int32-type input_ids sequence and int8-type attention_mask sequence for model input. Additionally, the dataset provides an int64-type labels field. The total download size of the dataset is 381,993,159 bytes, and the total size after decompression is 3,921,730,342 bytes.
提供机构:
Araods
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作