noname0202/constant-length-pretraining-dataset
收藏Hugging Face2025-03-25 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/noname0202/constant-length-pretraining-dataset
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含两个主要字段:input_ids和labels,前者为int32类型的序列,后者为int64类型的序列。整个数据集只有train一个训练集split,共有71497642个样本,总数据大小为110GB。数据集的下载大小为31GB。数据集提供了一个默认配置,指定了train split的数据文件路径。
The dataset includes two main fields: input_ids and labels, with input_ids being a sequence of int32 type and labels being a sequence of int64 type. The dataset consists of only one training split, train, which contains 71,497,642 examples and totals 110GB in size. The download size of the dataset is 31GB. The dataset provides a default configuration that specifies the data file path for the train split.
提供机构:
noname0202



