Wembedder Wikidata-20170613-Truthy-Beta-Cbow-Size=100-Window=1-Min_Count=20-Iter=25
收藏Zenodo2020-09-18 更新2026-05-25 收录
下载链接:
https://zenodo.org/record/827339
下载链接
链接失效反馈官方服务:
资源简介:
Wikidata embedding<br>
==================
Gensim model:<br>
wikidata-20170613-truthy-BETA-cbow-size=100-window=1-min_count=20-iter=25
Download of Wikidata from::
https://dumps.wikimedia.org/wikidatawiki/entities/
Trigram construction::
from bz2 import BZ2File<br>
import re
dump_filename = 'wikidata-20170613-truthy-BETA.nt.bz2'<br>
trigram_filename = 'wikidata-20170613-truthy-BETA.trigrams'
pattern = re.compile(<br>
(r'^<http://www.wikidata.org/entity/(Q\d+)> '<br>
r'<http://www.wikidata.org/prop/direct/(P\d+)> '<br>
r'<http://www.wikidata.org/entity/(Q\d+)>'),<br>
flags=re.UNICODE)
with open(trigram_filename, 'w') as f:<br>
for line in BZ2File(dump_filename):<br>
line = line.decode('utf-8')<br>
match = pattern.search(line)<br>
if match:<br>
f.write(" ".join(match.groups()) + '\n')
<br>
Construction of Gensim model::<br>
<br>
import logging<br>
from gensim.models import Word2Vec<br>
from gensim.models.word2vec import LineSentence
logging.basicConfig(<br>
format='%(asctime)s : %(levelname)s : %(message)s',<br>
level=logging.INFO)
sentences = LineSentence('wikidata-20170613-truthy-BETA.trigrams')
filename = 'wikidata-20170613-truthy-BETA-cbow-size=100-window=1-min_count=20-iter=25'<br>
w2v = Word2Vec(sentences, size=100, window=1, min_count=20, workers=10, iter=25)<br>
w2v.save(filename)
提供机构:
Zenodo
创建时间:
2017-07-14



