Embeddings trained on CONLL2017 Corpora (conll2017-embeddings) - Part 1
收藏DataCite Commons2024-09-19 更新2025-04-15 收录
下载链接:
https://fdat.uni-tuebingen.de/records/eh5fz-7ec28
下载链接
链接失效反馈官方服务:
资源简介:
The embeddings were trained with finalfrontier on the CONLL2017 corpora with more than 100m tokens. For all languages embeddings, were trained with the skip- and structgram algorithms and contain subword ngrams. All embeddings are stored in the finalfusion format and can be used an processed with tools provided by the finalfusion ecosystem.
N-Gram range (inclusive): 3 - 6
Number of hashing buckets: 2^21
Hashing function: FNV-1a
Window size: 10
Negative Samples: 5
Dimensions: 300
Minimum Token Frequency: 30
提供机构:
University of Tübingen
创建时间:
2024-09-19



