KnutJaegersberg/essential-web-smol-sample-fdc-partitioned
收藏Hugging Face2025-06-22 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/KnutJaegersberg/essential-web-smol-sample-fdc-partitioned
下载链接
链接失效反馈官方服务:
资源简介:
Essential-Web是一个包含24万亿个token的网络数据集,具有广泛的文档级元数据,旨在通过SQL-like过滤快速进行数据集整理。该数据集按照FDC Level-2类别进行划分,以便研究人员能够快速识别和筛选相关内容领域。
Essential-Web is a 24-trillion-token web dataset with extensive document-level metadata designed to enable rapid dataset curation through SQL-like filtering. This dataset is partitioned by Free Decimal Correspondence (FDC) level-2 categories to facilitate quick identification and filtering of relevant content domains.
提供机构:
KnutJaegersberg



