five

Word2vec models trained on English Wikipedia

收藏
Mendeley Data2024-05-10 更新2024-06-27 收录
下载链接:
https://zenodo.org/records/6542975
下载链接
链接失效反馈
官方服务:
资源简介:
This repository contains Word2Vec models trained on the full text of the English Wikipedia as downloaded in December 2021. Preprocessing: lowercasing n-grams up to 4-grams were computed using Bouma 2009 (https://svn.spraakdata.gu.se/repos/gerlof/pub/www/Docs/npmi-pfd.pdf), min freq threshold of 10 Two models, trained with Gensim: wiki_300_5_word2vec --> dim 300, freq threshold 5 wiki_300_50_word2vec --> dim 300, freq threshold 50 Other hyperparameters set as follows: window=5, epochs=5, seed=1830, sg=1 Note: Machine learning models trained on uncurated data inevitably learn hidden or obvious biases and as a result, the models shared with here might contain characteristics including sexism, racism, antisemitism, homophobia, and other such types of unacceptable biases. I encourage whoever is using these models to make sure such biases are actually removed before using them in production settings (see eg https://aclanthology.org/N19-1061/)
创建时间:
2023-06-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作