five

UViko

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://doi.org/10.7910/DVN/EDWHKB
下载链接
链接失效反馈
官方服务:
资源简介:
University of Ulsan - Vietnamese - Korean Parallel Corpus (454K sentence pairs) with Korean Word-Sense Annotation and Korean Word-sense disambiguation and Morphological analysis have been built in NLP Lab., University of Ulsan, Rep. of Korea. (http://nlplab.ulsan.ac.kr). UViko is a large-scale Vietnamese - Korean Parallel Corpus with the detailed information as the following. .UViko: Vietnamese - Korean Parallel Corpus. . Total sentences: 454,751 pairs . Average sentence length . Vietnamese: 19.3 . Korean: 12.0 . Korean with Word-sense disambiguation and Morphological analysis: 21.4 . Total tokens . Vietnamese: 8,790,197 . Korean: 5,435,686 . Korean with Word-sense disambiguation and Morphological analysis: 5,435,686 . Total vocabularies . Vietnamese: 40,090 . Korean: 397,130 . Korean with Word-sense disambiguation: 68,856 . Korean with Morphological analysis: 63,735 . The Korean Word-sense disambiguation and Morphological analysis were conducted by UTagger (http://nlplab.ulsan.ac.kr/doku.php?id=utagger) that consists of the following processes: . Korean morphological analysis . POS tagging . Sense-codes tagging (A sense-code, which represents a special sense of a word is defined in the Standard Korean Language Dictionary) UViko_original_sample.txt and UViko_WSD_MA_sample.txt are the sample files with 5,000 sentence pairs. If you want to use the full corpus, please contact us through e-mail: haivv279@gmail.com
创建时间:
2019-11-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作