skymizer/pretraining-5B
收藏Hugging Face2024-11-13 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/skymizer/pretraining-5B
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含文本数据和对应的词元计数,分为训练集和测试集。训练集包含5,732,022个样本,测试集包含1,141个样本。数据集总大小为21,282,400,367字节,下载大小为12,437,646,142字节。
This dataset contains text data and corresponding token counts, divided into training and test sets. The training set includes 5,732,022 samples, and the test set includes 1,141 samples. The total size of the dataset is 21,282,400,367 bytes, with a download size of 12,437,646,142 bytes.
提供机构:
skymizer



