falcon-refinedweb
收藏OpenXLab2026-04-18 收录
下载链接:
https://openxlab.org.cn/datasets/OpenDataLab/falcon-refinedweb
下载链接
链接失效反馈官方服务:
资源简介:
Falcon RefinedWeb was created to serve as an English large-scale dataset for the pretraining of large language models. It may be used on its own, or augmented with curated sources (e.g., Wikipedia, StackOverflow).
It was built on top of CommonCrawl, leveraging stringent filtering and extensive deduplication.
提供机构:
OpenDataLab
创建时间:
2023-12-06



