mc-ai/fineweb-1m-00002
收藏Hugging Face2025-04-17 更新2025-05-31 收录
下载链接:
https://hf-mirror.com/datasets/mc-ai/fineweb-1m-00002
下载链接
链接失效反馈官方服务:
资源简介:
这是一个包含了文本数据的训练集,数据字段包括文本内容、唯一标识符、数据来源、日期、文件路径、语言类型、语言置信度分数和token数量。训练集包含大约100万个示例,总大小为3.81GB。
This is a training set containing text data, with fields including text content, unique identifier, data source, date, file path, language type, language confidence score, and token count. The training set contains approximately 1 million examples and has a total size of 3.81GB.
提供机构:
mc-ai



