Italian FastText models
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8426106
下载链接
链接失效反馈官方服务:
资源简介:
Italian FastText models trained from scratch on a dataset composed of:
- wiki: a dump of Italian Wikipedia (as of December 15, 2022), comprising 25,548,651 sentences and 526,640,982 words (3.2 GB of raw text);
- webz: a dataset of Italian news (159,226 documents) from the webz.io platform, crawled in October 2015, containing 44,041,823 sentences and 44,544,385 words (244 MB);
- a dataset of 5,510 Italian news articles from the newspaper ModenaToday (MT) or 15,115 documents from the Italian version of Reuters (RCV2).
ft_wiki_wbz_mt_20_epochs: FastText model trained on the dataset consisting of wiki, webz, and MT for 20 epochs
ft_wiki_wbz_mt_50_epochs: FastText model trained on the dataset consisting of wiki, webz, and MT for 50 epochs
ft_wiki_wbz_reut_20_epochs: FastText model trained on the dataset consisting of wiki, webz, and RCV2 for 20 epochs
ft_wiki_wbz_reut_50_epochs: FastText model trained on the dataset consisting of wiki, webz, and RCV2 for 50 epochs
创建时间:
2023-10-10



