llm-slice/babylm-simple_wiki-preprocessed
收藏Hugging Face2025-06-28 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/llm-slice/babylm-simple_wiki-preprocessed
下载链接
链接失效反馈官方服务:
资源简介:
这是一个包含文本数据的机器学习数据集,分为训练集、验证集和测试集三个部分。训练集包含530,927个示例,大小为86,763,203字节;验证集包含49,363个示例,大小为8,304,471字节;测试集包含49,580个示例,大小为7,752,821字节。数据集的总下载大小为60,987,021字节,总数据大小为102,820,495字节。数据集中的特征是名为text的文本。
This is a machine learning dataset containing text data, split into three parts: training set, validation set, and test set. The training set contains 530,927 examples with a size of 86,763,203 bytes; the validation set contains 49,363 examples with a size of 8,304,471 bytes; the test set contains 49,580 examples with a size of 7,752,821 bytes. The total download size of the dataset is 60,987,021 bytes, and the total data size is 102,820,495 bytes. The feature in the dataset is named text.
提供机构:
llm-slice



