yuhuanstudio/wikipedia-pretrain-zh
收藏Hugging Face2025-12-18 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/yuhuanstudio/wikipedia-pretrain-zh
下载链接
链接失效反馈官方服务:
资源简介:
wiki-pretrain-zh是一个基于简体中文维基百科构建的数据集,更新于2025年10月1日,已经使用OpenCC转换成简体中文,并且去除了繁简体混用的文本。数据集包含标题和对应维基百科文本内容,适用于需要中文文本数据的场景。
wiki-pretrain-zh is a dataset constructed based on the Simplified Chinese Wikipedia, updated on October 1, 2025, which has been converted to Simplified Chinese using OpenCC and text with mixed Traditional and Simplified Chinese has been removed. The dataset includes titles and corresponding Wikipedia text content, suitable for scenarios that require Simplified Chinese text data.
提供机构:
yuhuanstudio



