kothasuhas/dclm_tokenized_n54714504_ctx32_commbin5
收藏Hugging Face2025-10-14 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/kothasuhas/dclm_tokenized_n54714504_ctx32_commbin5
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含整数序列特征input_ids,划分为训练集,共有约5471万条示例,数据集总大小为约7.22GB,下载大小约为247.45MB。
The dataset contains integer sequence feature input_ids, split into training set, with a total of approximately 54.71 million examples, the total size of the dataset is about 7.22GB, and the download size is about 247.45MB.
提供机构:
kothasuhas



