ikkiren/big_tokenized_dataset_half

Name: ikkiren/big_tokenized_dataset_half
Creator: ikkiren
Published: 2024-10-29 09:41:21
License: 暂无描述

Hugging Face2024-10-29 更新2024-12-14 收录

下载链接：

https://hf-mirror.com/datasets/ikkiren/big_tokenized_dataset_half

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含两个主要分割：训练集和测试集。训练集包含2,226,862个样本，占用约4,460,075,153.72字节；测试集包含247,430个样本，占用约495,565,686.28字节。数据集总大小为4,955,640,840字节，下载大小为1,431,016,005字节。数据特征包括名为input_ids的序列，其数据类型为int64。

This dataset includes two main splits: train and test. The train split contains 2,226,862 examples, occupying approximately 4,460,075,153.72 bytes; the test split contains 247,430 examples, occupying approximately 495,565,686.28 bytes. The total size of the dataset is 4,955,640,840 bytes, with a download size of 1,431,016,005 bytes. The features include a sequence named input_ids with a data type of int64.

提供机构：

ikkiren

5,000+

优质数据集

54 个

任务类型

进入经典数据集