five

Embeddings trained on CONLL2017 Corpora (conll2017-embeddings) - Part 1

收藏
DataCite Commons2024-09-19 更新2025-04-15 收录
下载链接:
https://fdat.uni-tuebingen.de/records/eh5fz-7ec28
下载链接
链接失效反馈
官方服务:
资源简介:
The embeddings were trained with finalfrontier on the CONLL2017 corpora with more than 100m tokens. For all languages embeddings, were trained with the skip- and structgram algorithms and contain subword ngrams. All embeddings are stored in the finalfusion format and can be used an processed with tools provided by the finalfusion ecosystem. N-Gram range (inclusive): 3 - 6 Number of hashing buckets: 2^21 Hashing function: FNV-1a Window size: 10 Negative Samples: 5 Dimensions: 300 Minimum Token Frequency: 30
提供机构:
University of Tübingen
创建时间:
2024-09-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作