evinsi/fineweb-edu-Llama-3.2-Instruct-Shuffled
收藏Hugging Face2025-02-21 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/evinsi/fineweb-edu-Llama-3.2-Instruct-Shuffled
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了文本处理所需的多个数组文件和文本文件,分为训练集、验证集和测试集,总大小约为81.6GB。数据集特征包括键、URL、生成掩码、输入ID、填充掩码和段ID等。训练集包含约603.9万个示例,而验证集和测试集各包含1000个示例。
The dataset consists of multiple array files and text files necessary for text processing, split into training, validation, and test sets, with a total size of approximately 81.6GB. Dataset features include key, URL, generation mask, input IDs, padding mask, and segment IDs. The training set contains about 6.039 million examples, while the validation and test sets each contain 1000 examples.
提供机构:
evinsi



