llm-slice/babylm-open_subtitles-preprocessed
收藏Hugging Face2025-06-28 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/llm-slice/babylm-open_subtitles-preprocessed
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了用于训练、验证和测试的文本数据,共分为三个部分:训练集包含3491166个示例,验证集包含374595个示例,测试集包含344632个示例。数据集的总大小为139789999字节。
The dataset consists of text data for training, validation, and testing, divided into three parts: the training set contains 3,491,166 examples, the validation set contains 374,595 examples, and the test set contains 344,632 examples. The total size of the dataset is 139,789,999 bytes.
提供机构:
llm-slice



