five

Deep Reference Mining from Scholarly Literature in the Arts and Humanities - Pre-trained word embeddings

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/1175212
下载链接
链接失效反馈
官方服务:
资源简介:
Pre-trained word vectors of dimensionality 100 and 300 for the publication: Deep Reference Mining from Scholarly Literature in the Arts and Humanities, submitted to Frontiers in Digital Humanities. The corpus of scholarly publications from which these vectors were trained is under copyright, therefore we publish these vectors for reproducibility. Please refer to the publication's repository for further details: https://github.com/dhlab-epfl/LinkedBooksDeepReferenceParsing. These vectors were trained using Gensim 3.1.0. The corpus was preprocessed as follows: word tokenization with NLTK word_punct tokenizer. digits were converted into the $NUM$ token words less frequent than 5 times, for every document, were converted to the $UNK$ token vectors were trained using the function: Word2Vec(window=5, min_count=5, sg=1)
创建时间:
2020-01-24
二维码
社区交流群
二维码
科研交流群
商业服务