UK Twitter word embeddings
收藏DataCite Commons2020-09-03 更新2024-07-25 收录
下载链接:
https://figshare.com/articles/dataset/UK_Twitter_word_embeddings/4052331
下载链接
链接失效反馈官方服务:
资源简介:
<b>Word embeddings trained on Twitter content geo-located in the United Kingdom</b><br>The total number of tweets used was approximately 215 million, dated from February 1, 2014 to March 31, 2016. Word2vec has been applied as implemented in the gensim library (https://radimrehurek.com/gensim/).<br><b>Settings:</b> Continuous bag-of-words (CBOW), the entirety of a tweet as a window, negative sampling (5 noise words), and a dimensionality of 512. <br>After filtering out words with less than 500 occurrences, an embedding corpus of 137,421 unigrams was obtained (see <b>vocabulary.txt</b>). The corresponding 512-dimensional embeddings are held in <b>vectors.zip</b>.<br>
提供机构:
figshare
创建时间:
2016-10-22



