textcleanlm/textclean-2B-raw
收藏Hugging Face2025-08-14 更新2025-09-13 收录
下载链接:
https://hf-mirror.com/datasets/textcleanlm/textclean-2B-raw
下载链接
链接失效反馈官方服务:
资源简介:
这是一个由24个Parquet文件合并而成的数据集,存储在`data/eai_data_files/data`目录下,文件模式为`(all *.parquet)`,输出分片为`data/train-00000-of-00001.parquet`,使用`zstd`压缩方法。数据集的具体内容和用途未在README中说明。
This is a dataset merged from 24 Parquet files, stored in the `data/eai_data_files/data` directory, with the file pattern `(all *.parquet)`, output shard as `data/train-00000-of-00001.parquet`, and compressed using the `zstd` method. The specific content and purpose of the dataset are not described in the README.
提供机构:
textcleanlm



