Deep Reference Mining from Scholarly Literature in the Arts and Humanities - Pre-trained word embeddings
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/1175212
下载链接
链接失效反馈官方服务:
资源简介:
Pre-trained word vectors of dimensionality 100 and 300 for the publication: Deep Reference Mining from Scholarly Literature in the Arts and Humanities, submitted to Frontiers in Digital Humanities.
The corpus of scholarly publications from which these vectors were trained is under copyright, therefore we publish these vectors for reproducibility. Please refer to the publication's repository for further details: https://github.com/dhlab-epfl/LinkedBooksDeepReferenceParsing.
These vectors were trained using Gensim 3.1.0. The corpus was preprocessed as follows:
word tokenization with NLTK word_punct tokenizer.
digits were converted into the $NUM$ token
words less frequent than 5 times, for every document, were converted to the $UNK$ token
vectors were trained using the function: Word2Vec(window=5, min_count=5, sg=1)
创建时间:
2020-01-24



