UViko
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://doi.org/10.7910/DVN/EDWHKB
下载链接
链接失效反馈官方服务:
资源简介:
University of Ulsan - Vietnamese - Korean Parallel Corpus (454K sentence pairs) with Korean Word-Sense Annotation and Korean Word-sense disambiguation and Morphological analysis have been built in NLP Lab., University of Ulsan, Rep. of Korea. (http://nlplab.ulsan.ac.kr). UViko is a large-scale Vietnamese - Korean Parallel Corpus with the detailed information as the following. .UViko: Vietnamese - Korean Parallel Corpus. . Total sentences: 454,751 pairs . Average sentence length . Vietnamese: 19.3 . Korean: 12.0 . Korean with Word-sense disambiguation and Morphological analysis: 21.4 . Total tokens . Vietnamese: 8,790,197 . Korean: 5,435,686 . Korean with Word-sense disambiguation and Morphological analysis: 5,435,686 . Total vocabularies . Vietnamese: 40,090 . Korean: 397,130 . Korean with Word-sense disambiguation: 68,856 . Korean with Morphological analysis: 63,735 . The Korean Word-sense disambiguation and Morphological analysis were conducted by UTagger (http://nlplab.ulsan.ac.kr/doku.php?id=utagger) that consists of the following processes: . Korean morphological analysis . POS tagging . Sense-codes tagging (A sense-code, which represents a special sense of a word is defined in the Standard Korean Language Dictionary) UViko_original_sample.txt and UViko_WSD_MA_sample.txt are the sample files with 5,000 sentence pairs. If you want to use the full corpus, please contact us through e-mail: haivv279@gmail.com
创建时间:
2019-11-11



