skymizer/fineweb-edu-dedup-5B
收藏Hugging Face2024-11-18 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/skymizer/fineweb-edu-dedup-5B
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含文本数据和唯一标识符,以及丰富的元数据信息。元数据包括数据来源、URL、日期、文件路径、语言、语言评分、词数、评分和整数评分等。数据集分为训练集和测试集,训练集包含4824666个示例,测试集包含4338个示例。数据集的下载大小为14399682382字节,总大小为24894983399字节。
This dataset contains text data and unique identifiers, along with rich metadata information. The metadata includes data source, URL, date, file path, language, language score, token count, score, and integer score. The dataset is divided into a training set and a test set, with the training set containing 4,824,666 examples and the test set containing 4,338 examples. The download size of the dataset is 14,399,682,382 bytes, and the total size is 24,894,983,399 bytes.
提供机构:
skymizer



