five

Wembedder wikidata-20170613-truthy-BETA-cbow-size=100-window=1-min_count=20-iter=25

收藏
Mendeley Data2024-03-27 更新2024-06-27 收录
下载链接:
https://zenodo.org/record/827339
下载链接
链接失效反馈
官方服务:
资源简介:
Wikidata embedding ================== Gensim model: wikidata-20170613-truthy-BETA-cbow-size=100-window=1-min_count=20-iter=25 Download of Wikidata from:: https://dumps.wikimedia.org/wikidatawiki/entities/ Trigram construction:: from bz2 import BZ2File import re dump_filename = 'wikidata-20170613-truthy-BETA.nt.bz2' trigram_filename = 'wikidata-20170613-truthy-BETA.trigrams' pattern = re.compile( (r'^<http://www.wikidata.org/entity/(Q\d+)> ' r'<http://www.wikidata.org/prop/direct/(P\d+)> ' r'<http://www.wikidata.org/entity/(Q\d+)>'), flags=re.UNICODE) with open(trigram_filename, 'w') as f: for line in BZ2File(dump_filename): line = line.decode('utf-8') match = pattern.search(line) if match: f.write(" ".join(match.groups()) + '\n') Construction of Gensim model:: import logging from gensim.models import Word2Vec from gensim.models.word2vec import LineSentence logging.basicConfig( format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO) sentences = LineSentence('wikidata-20170613-truthy-BETA.trigrams') filename = 'wikidata-20170613-truthy-BETA-cbow-size=100-window=1-min_count=20-iter=25' w2v = Word2Vec(sentences, size=100, window=1, min_count=20, workers=10, iter=25) w2v.save(filename)
创建时间:
2023-06-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作