kothasuhas/dclm_tokenized_n7481318_ctx32_commbin5
收藏Hugging Face2025-10-14 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/kothasuhas/dclm_tokenized_n7481318_ctx32_commbin5
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含一个名为input_ids的整数序列特征。数据集被划分为训练集(train),其中包含7481318个样本,总字节数为987533976字节。整个数据集的大小为987533976字节,下载大小为33836611字节。没有提供具体的数据集内容和用途描述。
The dataset includes a feature named input_ids, which is a sequence of integers. The dataset is split into a training set (train) containing 7481318 examples, with a total of 987533976 bytes. The entire dataset is 987533976 bytes in size, and the download size is 33836611 bytes. No specific description of the dataset content and purpose is provided.
提供机构:
kothasuhas



