PatrickHaller/fineweb-2-de-10B
收藏Hugging Face2025-05-27 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/PatrickHaller/fineweb-2-de-10B
下载链接
链接失效反馈官方服务:
资源简介:
这是一个包含文本数据的训练集,数据字段包括文本内容、唯一标识符、URL链接、日期、文件路径、文本语言、语言相似度评分、语言脚本、最小哈希簇大小以及文本的顶级语言分类。数据集总共包含约15265871个示例,大小为50148023468字节。
This is a training dataset containing text data, with fields including text content, unique identifier, URL link, date, file path, text language, language similarity score, language script, minimum hash cluster size, and top-level language classification of the text. The dataset contains a total of approximately 15,265,871 examples, with a size of 50,148,023,468 bytes.
提供机构:
PatrickHaller



