yuhuanstudio/wikipedia-pretrain-zh-tw
收藏Hugging Face2025-10-10 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/yuhuanstudio/wikipedia-pretrain-zh-tw
下载链接
链接失效反馈官方服务:
资源简介:
这是一个包含截至2023年5月的台灣正體中文维基百科全部条目的数据集,共有2,533,212条条目,提供HTML和Markdown两种格式。数据来源于维基百科API,并严格按照台湾正体版本进行收集,避免了繁简体混合的问题。
This dataset contains all entries of Traditional Chinese (`zh-tw`) Wikipedia as of May 2023, totaling 2,533,212 entries, provided in both HTML and Markdown formats. The data was sourced from the Wikipedia API and collected strictly according to the Taiwan Traditional version, avoiding the mixture of Simplified and Traditional characters.
提供机构:
yuhuanstudio



