CrowdMind/finewebedu-1M-samples-2048t
收藏Hugging Face2025-09-19 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/CrowdMind/finewebedu-1M-samples-2048t
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个包含文本内容的集合,每个文本都有一个唯一标识符,可能还包括URL、文件路径、语言类型、语言置信度分数、token数量、分数和整型分数等元信息。数据集被划分为训练集,包含大约100万个示例,文件大小为6777992813字节。
This dataset is a collection of text contents, each with a unique identifier, and may also include URL, file path, language type, language confidence score, token count, score, and integer score metadata. The dataset is split into a training set, containing approximately 1 million examples, with a file size of 6777992813 bytes.
提供机构:
CrowdMind



