felix-ai-alt/pretokenized__HuggingFaceFW_fineweb-edu__EleutherAI__gpt-neo-125m
收藏Hugging Face2024-11-06 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/felix-ai-alt/pretokenized__HuggingFaceFW_fineweb-edu__EleutherAI__gpt-neo-125m
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多个特征,如唯一标识、数据来源、文件路径、语言信息、语言评分、标记计数、评分、整数评分、输入ID序列和注意力掩码序列。数据集分为一个训练集,包含5,765,432个示例,总大小为24,942,837,662.25636字节。下载大小为13,644,753,644字节。
The dataset includes multiple features such as unique identifiers, data sources, file paths, language information, language scores, token counts, scores, integer scores, input ID sequences, and attention mask sequences. The dataset is divided into a training set containing 5,765,432 examples, with a total size of 24,942,837,662.25636 bytes. The download size is 13,644,753,644 bytes.
提供机构:
felix-ai-alt



