Italian Word2Vec models
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8367725
下载链接
链接失效反馈官方服务:
资源简介:
Italian Word2Vec models trained from scratch on a dataset composed of:
- wiki: a dump of Italian Wikipedia (as of December 15, 2022), comprising 25,548,651 sentences and 526,640,982 words (3.2 GB of raw text);
- webz: a dataset of Italian news (159,226 documents) from the webz.io platform, crawled in October 2015, containing 44,041,823 sentences and 44,544,385 words (244 MB);
- a dataset of 5,510 Italian news articles from the newspaper ModenaToday (MT) or 15,115 documents from the Italian version of Reuters (RCV2).
w2v_wiki_wbz_mt_20_epochs.zip: Word2Vec model trained on the dataset consisting of wiki, webz, and MT for 20 epochs
w2v_wiki_wbz_mt_50_epochs.zip: Word2Vec model trained on the dataset consisting of wiki, webz, and MT for 50 epochs
w2v_wiki_wbz_reut_20_epochs.zip: Word2Vec model trained on the dataset consisting of wiki, webz, and RCV2 for 20 epochs
w2v_wiki_wbz_reut_50_epochs.zip: Word2Vec model trained on the dataset consisting of wiki, webz, and RCV2 for 50 epochs
创建时间:
2023-10-10



