pietrolesci/finewebedu-20B
收藏Hugging Face2025-03-16 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/pietrolesci/finewebedu-20B
下载链接
链接失效反馈官方服务:
资源简介:
这是一个文本生成任务的数据集,语言为英语,数据大小在10M到100M之间。它是HuggingFaceFW/fineweb-edu/100BT数据集的一个子集,包含了前20,200,000行数据,其中20M行用于训练,200k行用于验证。提供了两种数据配置:默认配置和bpe32000minipile配置,后者拥有21.6B个token。
This is a text generation dataset in English, with a size ranging from 10M to 100M. It is a subset of the HuggingFaceFW/fineweb-edu/100BT dataset, containing the first 20,200,000 rows, of which 20M are for training and 200k for validation. Two data configurations are provided: the default configuration and the bpe32000minipile configuration, which has 21.6B tokens.
提供机构:
pietrolesci



