kanishka/babylm2-subset
收藏Hugging Face2024-07-24 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/kanishka/babylm2-subset
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含文本数据,主要分为训练集和验证集两部分。训练集包含8,434,896个样本,总大小为380,752,170字节;验证集包含834,155个样本,总大小为40,660,051字节。整个数据集的下载大小为235,635,151字节,总大小为421,412,221字节。数据文件按照分割存储在指定的路径下。
This dataset contains text data, primarily divided into a training set and a validation set. The training set includes 8,434,896 samples with a total size of 380,752,170 bytes; the validation set includes 834,155 samples with a total size of 40,660,051 bytes. The entire dataset has a download size of 235,635,151 bytes and a total size of 421,412,221 bytes. Data files are stored in specified paths according to their splits.
提供机构:
kanishka



