Korean embedding files using the different morphological segmentation granularity of the word
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/5869737
下载链接
链接失效反馈官方服务:
资源简介:
Embedding files using the following segmentation:
wordUD
morphUD
+morphUD
Based on wordUD there are 9,692,938 sentences and 157,653,628 words (tokenized) including all articles published in The Hankyoreh during 2016 (1.2M sentences), Sejong morphologically analyzed corpus (3M), and Korean Wiki (20201101) (5.3M):
./fasttext skipgram -input input -output embedding -dim 300
创建时间:
2022-02-03



