five

UK Twitter word embeddings

收藏
DataCite Commons2020-09-03 更新2024-07-25 收录
下载链接:
https://figshare.com/articles/dataset/UK_Twitter_word_embeddings/4052331
下载链接
链接失效反馈
官方服务:
资源简介:
<b>Word embeddings trained on Twitter content geo-located in the United Kingdom</b><br>The total number of tweets used was approximately 215 million, dated from February 1, 2014 to March 31, 2016. Word2vec has been applied as implemented in the gensim library (https://radimrehurek.com/gensim/).<br><b>Settings:</b> Continuous bag-of-words (CBOW), the entirety of a tweet as a window, negative sampling (5 noise words), and a dimensionality of 512. <br>After filtering out words with less than 500 occurrences, an embedding corpus of 137,421 unigrams was obtained (see <b>vocabulary.txt</b>). The corresponding 512-dimensional embeddings are held in <b>vectors.zip</b>.<br>
提供机构:
figshare
创建时间:
2016-10-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作