five

WikiDBs 10k - A Corpus Of Relational Databases From Wikidata

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/8227451
下载链接
链接失效反馈
官方服务:
资源简介:
WikiDBs-10k (https://wikidbs.github.io/) is a corpus of relational databases built from Wikidata (https://www.wikidata.org/). This is the preliminary 10k version, the newer version of 100k databases (https://zenodo.org/records/11559814)  includes more coherent databases and more diverse table and column names. The WikiDBs-10k corpus consists of 10,000 databases, for more details read our paper: https://ceur-ws.org/Vol-3462/TADA3.pdf (TaDA@VLDB'23) Each database is saved in a sub-folder, the table files are provided as csv files and the database schema as a json file. We thank Till Döhmen and Madelon Hulsebos for generously providing the table statistics from their GitSchemas dataset and Jan-Micha Bodensohn for converting the dataset to SQLite files. This work has been supported by the BMBF and the state of Hesse as part of the NHR Program and the BMBF project KompAKI (grant number 02L19C150), as well as the HMWK cluster project 3AI. Finally, we want to thank hessian.AI, and DFKI Darmstadt for their support.
创建时间:
2024-11-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作