five

Wikidata 3 Topical Subsets (Gene Wiki, Music, Ships) and 4 Random Subsets

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7332160
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains the N-Triples files of 3 Wikidata topical subsets corresponding to 3 Wikidata WikiProject: Gene Wiki, Music, and Ships along with 4 random subsets in different sizes: two of 100K items, one 500K items, and one 1M items. Subsets are extracted from the 3 January 2022 dump. All subsets have been extracted with WDumper using these JSON specification files. The files are: GeneWiki.zip: contains 25 `.nt.gz` RDF files each of which corresponds to one of the main Gene Wiki WikiProject classes, e.g. protein, gene, chemical compound, etc. music.nt.gz: the RDF file corresponding to the Music WikiProject. ships.nt.gz: the RDF file corresponding to the Ships WikiProject. Random100K_1.zip: contains 2 `nt.gz` RDF files each of which includes (about) 50,000 random Wikidata items, 100,000 items in total. Random100K_2.zip: contains 2 `nt.gz` RDF files each of which includes (about) 50,000 random Wikidata items, 100,000 items in total. Random500K.zip: contains 10 `nt.gz` RDF files each of which includes (about) 50,000 random Wikidata items, 500,000 items in total. Random1M.zip: contains 20 `nt.gz` RDF files each of which includes (about) 50,000 random Wikidata items, 1,000,000 items in total.
创建时间:
2022-11-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作