CrowdMind/fortified-fineweb-edu
收藏Hugging Face2025-11-09 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/CrowdMind/fortified-fineweb-edu
下载链接
链接失效反馈官方服务:
资源简介:
这是一个优化的、仅包含文本列的数据集,适用于训练期望纯文本示例的数据加载器。它基于Josephgflowers/Par-Four-Fineweb-Edu-Fortified数据集,并以Parquet格式导出。数据集以字符为基础进行分片,每个分片约包含2.5亿个字符。数据集的分片上传数量为17。
This dataset is an optimized, text-only version, suitable for training data loaders that expect text-only examples. It is based on the Josephgflowers/Par-Four-Fineweb-Edu-Fortified dataset and exported in Parquet format. The dataset is sharded by character count, with each shard containing approximately 250 million characters. The number of shards uploaded is 17.
提供机构:
CrowdMind



