WanJuan-Vietnamese
收藏Opencsg2025-04-23 更新2025-05-03 收录
下载链接:
https://www.opencsg.com/datasets/AIWizards/WanJuan-Vietnamese
下载链接
链接失效反馈官方服务:
资源简介:
WanJuan-Vietnamese(万卷丝路-越南语)语料库是一个超过280GB的越南语文本数据集,它包含历史、政治、文化、房地产等7个大类和34个子类,覆盖面广泛。该语料库适用于文本生成等任务,并采用CC BY 4.0 许可协议。使用者可以自由分享和修改数据,但必须署名并注明修改之处。
The WanJuan-Vietnamese (Wanjuan Silk Road-Vietnamese) Corpus is a Vietnamese text dataset exceeding 280GB in total size. It contains 7 major categories including history, politics, culture, real estate and others, as well as 34 subcategories, featuring wide coverage. This corpus is suitable for tasks such as text generation, and is licensed under CC BY 4.0. Users may freely share and modify the dataset, but are required to provide proper attribution and clearly note any modifications made.
创建时间:
2025-04-29
搜集汇总
数据集介绍

背景与挑战
背景概述
WanJuan-Vietnamese是一个超过280GB的越南语文本数据集,涵盖历史、政治、文化等7个大类和34个子类,适用于文本生成任务,采用CC BY 4.0许可协议。
以上内容由遇见数据集搜集并总结生成



