xfxcwynlc/longcrawl-packed32k-tokenized2
收藏Hugging Face2025-03-29 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/xfxcwynlc/longcrawl-packed32k-tokenized2
下载链接
链接失效反馈官方服务:
资源简介:
这是一个包含input_ids,labels,attention_mask三个字段的序列数据集,分为训练集,总大小为130354776000字节,共有306000个样本。数据集采用default配置,训练数据文件路径为data/train-*。
This is a sequence dataset containing three fields: input_ids, labels, and attention_mask, split into a training set with a total size of 130354776000 bytes and 306000 samples. The dataset uses the default configuration with training data file paths set to data/train-*.
提供机构:
xfxcwynlc



