Deep Reference Mining from Scholarly Literature in the Arts and Humanities - Pre-trained word embeddings

NIAID Data Ecosystem2026-03-11 收录

下载链接：

https://zenodo.org/record/1175212

下载链接

链接失效反馈

官方服务：

资源简介：

Pre-trained word vectors of dimensionality 100 and 300 for the publication: Deep Reference Mining from Scholarly Literature in the Arts and Humanities, submitted to Frontiers in Digital Humanities. The corpus of scholarly publications from which these vectors were trained is under copyright, therefore we publish these vectors for reproducibility. Please refer to the publication's repository for further details: https://github.com/dhlab-epfl/LinkedBooksDeepReferenceParsing. These vectors were trained using Gensim 3.1.0. The corpus was preprocessed as follows: word tokenization with NLTK word_punct tokenizer. digits were converted into the $NUM$ token words less frequent than 5 times, for every document, were converted to the $UNK$ token vectors were trained using the function: Word2Vec(window=5, min_count=5, sg=1)

创建时间：

2020-01-24