Wembedder wikidata-20170613-truthy-BETA-cbow-size=100-window=1-min_count=20-iter=25
收藏Mendeley Data2024-03-27 更新2024-06-27 收录
下载链接:
https://zenodo.org/record/827339
下载链接
链接失效反馈官方服务:
资源简介:
Wikidata embedding ================== Gensim model: wikidata-20170613-truthy-BETA-cbow-size=100-window=1-min_count=20-iter=25 Download of Wikidata from:: https://dumps.wikimedia.org/wikidatawiki/entities/ Trigram construction:: from bz2 import BZ2File import re dump_filename = 'wikidata-20170613-truthy-BETA.nt.bz2' trigram_filename = 'wikidata-20170613-truthy-BETA.trigrams' pattern = re.compile( (r'^<http://www.wikidata.org/entity/(Q\d+)> ' r'<http://www.wikidata.org/prop/direct/(P\d+)> ' r'<http://www.wikidata.org/entity/(Q\d+)>'), flags=re.UNICODE) with open(trigram_filename, 'w') as f: for line in BZ2File(dump_filename): line = line.decode('utf-8') match = pattern.search(line) if match: f.write(" ".join(match.groups()) + '\n') Construction of Gensim model:: import logging from gensim.models import Word2Vec from gensim.models.word2vec import LineSentence logging.basicConfig( format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO) sentences = LineSentence('wikidata-20170613-truthy-BETA.trigrams') filename = 'wikidata-20170613-truthy-BETA-cbow-size=100-window=1-min_count=20-iter=25' w2v = Word2Vec(sentences, size=100, window=1, min_count=20, workers=10, iter=25) w2v.save(filename)
创建时间:
2023-06-28



