five

Italian FastText models

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8426106
下载链接
链接失效反馈
官方服务:
资源简介:
Italian FastText models trained from scratch on a dataset composed of: - wiki: a dump of Italian Wikipedia (as of December 15, 2022), comprising 25,548,651 sentences and 526,640,982 words (3.2 GB of raw text); - webz: a dataset of Italian news (159,226 documents) from the webz.io platform, crawled in October 2015, containing 44,041,823 sentences and 44,544,385 words (244 MB); - a dataset of 5,510 Italian news articles from the newspaper ModenaToday (MT) or 15,115 documents from the Italian version of Reuters (RCV2). ft_wiki_wbz_mt_20_epochs: FastText model trained on the dataset consisting of wiki, webz, and MT for 20 epochs ft_wiki_wbz_mt_50_epochs: FastText model trained on the dataset consisting of wiki, webz, and MT for 50 epochs ft_wiki_wbz_reut_20_epochs: FastText model trained on the dataset consisting of wiki, webz, and RCV2 for 20 epochs ft_wiki_wbz_reut_50_epochs: FastText model trained on the dataset consisting of wiki, webz, and RCV2 for 50 epochs
创建时间:
2023-10-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作