zerostratos/fineweb-2-vie-2020
收藏Hugging Face2025-08-28 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/zerostratos/fineweb-2-vie-2020
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含文本内容、唯一标识符、URL、日期、文件路径、语言信息、语言评分、语言脚本、最小哈希聚类大小、顶级语言等信息字段。数据集分为训练集,共有500万个示例,大小为26,975,472,156字节。
The dataset includes fields such as text content, unique identifier, URL, date, file path, language information, language score, language script, minimum hash cluster size, top-level languages, etc. The dataset is split into a training set with a total of 5,000,000 examples and a size of 26,975,472,156 bytes.
提供机构:
zerostratos



