prithivMLmods/OpenWeb383K
收藏Hugging Face2025-02-06 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/prithivMLmods/OpenWeb383K
下载链接
链接失效反馈官方服务:
资源简介:
OpenWeb Datasets Web Collection是一个由FineWeb数据集衍生出来的大型英文网页数据集,包含了超过15万亿个清理过和去重的英文网页数据,来自于CommonCrawl。它旨在作为大型语言模型预训练数据集的公共数据研究工具,覆盖了多种领域和主题。
The OpenWeb Datasets Web Collection is a large English web dataset derived from the FineWeb dataset, containing more than 15 trillion tokens of cleaned and deduplicated English web data from CommonCrawl. It is intended to serve as a research tool for public data in the context of pretraining datasets for large language models, covering a variety of domains and topics.
提供机构:
prithivMLmods



