300-Dimensional Word Embeddings for Nepali Language
收藏Mendeley Data2024-01-31 更新2024-06-29 收录
下载链接:
https://ieee-dataport.org/open-access/300-dimensional-word-embeddings-nepali-language
下载链接
链接失效反馈官方服务:
资源简介:
This pre-trained Word2Vec model has 300-dimensional vectors for more than 0.5 million Nepali words and phrases. A separate Nepali language text corpus was created using the news contents freely available in the public domain. The text corpus contained more than 90 million running words. The "Nepali Text Corpus" can be accessed freely from https://ieee-dataport.org/open-access/nepali-text-corpus.Word2Vec model details: Embeddings Dimension: 300, Architecture: Continuous - BOW, Training algorithm: Negative sampling = 15, Context (window) size: 10, Token minimum count: 2, Encoded in UTF-8.
本预训练Word2Vec模型为超过50万条尼泊尔语单词及短语提供300维词向量。研究团队借助公共领域可免费获取的新闻内容,构建了独立的尼泊尔语文本语料库。该文本语料库包含超过9000万连续文本词。"尼泊尔语文本语料库"可从https://ieee-dataport.org/open-access/nepali-text-corpus免费获取。
Word2Vec模型参数详情如下:
词嵌入维度:300
模型架构:连续词袋模型(Continuous BOW)
训练算法:负采样(Negative sampling)数量为15
上下文窗口大小:10
Token最小出现次数:2
编码格式:UTF-8
创建时间:
2024-01-31



