skymizer/fineweb-edu-dedup-45B-1-of-2
收藏Hugging Face2025-01-11 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/skymizer/fineweb-edu-dedup-45B-1-of-2
下载链接
链接失效反馈官方服务:
资源简介:
这是一个包含文本数据的训练集,数据集中的每个样本都包含文本内容、唯一标识符以及元数据。元数据中包含dump信息、URL、日期、文件路径、语言、语言分数、token数量、分数和整数分数等详细信息。训练集大小为111,908,944,850.07036字节,共有21,684,169个样本。数据集的下载大小为64,973,572,231字节。
This is a training dataset containing text data, with each sample including text content, a unique identifier, and metadata. The metadata includes detailed information such as dump information, URL, date, file path, language, language score, token count, score, and integer score. The training set is 111,908,944,850.07036 bytes in size and contains 21,684,169 samples. The download size of the dataset is 64,973,572,231 bytes.
提供机构:
skymizer



