m-a-p/FineFineWeb-sample
收藏Hugging Face2024-12-19 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/m-a-p/FineFineWeb-sample
下载链接
链接失效反馈官方服务:
资源简介:
FineFineWeb是一个细粒度领域网络语料库,包含多个领域的文本数据,如航空航天、农业、艺术、天文学等。每个领域都有大量的样本和令牌,数据集通过去重、URL标注、粗召回、细召回等步骤构建而成。该数据集适用于文本分类、文本到文本生成和文本生成等任务。
FineFineWeb is a fine-grained domain web corpus that includes text data from multiple domains such as aerospace, agronomy, art, astronomy, etc. Each domain contains a large number of samples and tokens. The dataset is constructed through steps including deduplication, URL labeling, coarse recall, and fine recall. It is suitable for tasks like text classification, text-to-text generation, and text generation.
提供机构:
m-a-p



