ainz/cosmopajama
收藏Hugging Face2025-10-03 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/ainz/cosmopajama
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含三个部分:cosmopedia、slimpajama_train和slimpajama_val。它主要用于文本处理任务,其中每个样本都是一个文本字符串。cosmopedia部分有100000个样本,slimpajama_train部分有274450个训练样本,slimpajama_val部分有467个验证样本。数据集的总大小为1.59GB,下载大小为930MB。
The dataset consists of three parts: cosmopedia, slimpajama_train, and slimpajama_val. It is primarily used for text processing tasks, with each sample being a text string. The cosmopedia part contains 100,000 samples, the slimpajama_train part has 274,450 training samples, and the slimpajama_val part has 467 validation samples. The total size of the dataset is 1.59GB, with a download size of 930MB.
提供机构:
ainz



