kanishka/babylm2-rewritten-clean
收藏Hugging Face2025-01-12 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/kanishka/babylm2-rewritten-clean
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含一个名为text的字符串类型特征,分为训练集和验证集,分别有12002861和1226247个样本。数据集的总下载大小为355719445字节,总数据集大小为577908816字节。数据集配置为默认配置,训练集和验证集的数据文件分别存储在data/train-*和data/validation-*路径下。
The dataset contains a feature named text of type string, divided into a training set and a validation set with 12002861 and 1226247 samples respectively. The total download size of the dataset is 355719445 bytes, and the total dataset size is 577908816 bytes. The dataset is configured as the default configuration, with the training and validation set data files stored in data/train-* and data/validation-* paths respectively.
提供机构:
kanishka



