Chinese-Fineweb-Edu
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/opencsg/chinese-fineweb-edu
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是用于在持续预训练过程中防止模型退化的中文数据集,它被用于重放以维持模型在中文语言处理方面的熟练度。具体任务是为持续预训练提供重放数据。
This is a Chinese-language dataset intended to prevent model degradation during continuous pre-training. It is utilized for replay to maintain the model's proficiency in Chinese language processing. Its specific task is to provide replay data for continuous pre-training.
提供机构:
Hugging Face



