opendatalab/WanJuan-Arabic
收藏Hugging Face2025-04-22 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/opendatalab/WanJuan-Arabic
下载链接
链接失效反馈官方服务:
资源简介:
万卷丝路-阿拉伯语语料库是一个超过220GB的大型语料库,包含7个主要类别和34个子类别,内容涵盖历史、政治、文化、房地产、购物、天气、餐饮、百科全书和专业知识等多个方面。适合用于低资源语言的高质量开源网络文本数据集。
WanJuan-Arabic corpus is a large-scale dataset exceeding 220GB, comprising 7 major categories and 34 subcategories, covering various aspects such as history, politics, culture, real estate, shopping, weather, dining, encyclopedias, and professional knowledge. It is a high-quality open-source webtext dataset suitable for low-resource languages.
提供机构:
opendatalab



