Marcus2112/minipile_k440_high-inter_density
收藏Hugging Face2025-01-23 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/Marcus2112/minipile_k440_high-inter_density
下载链接
链接失效反馈官方服务:
资源简介:
这是一个基于The Pile Deduplicated的数据集,包含文本和索引两种特征。数据集分为训练集、验证集和测试集,其中训练集包含995270个示例,验证集包含9952个示例,测试集包含499个示例。数据集的总大小为6132024785字节。
This dataset is based on The Pile Deduplicated, including two features: text and pile_idx. The dataset is split into training, validation, and test sets, with the training set containing 995270 examples, the validation set containing 9952 examples, and the test set containing 499 examples. The total size of the dataset is 6132024785 bytes.
提供机构:
Marcus2112



