WikiDBs - A Large-Scale Corpus Of Relational Databases From Wikidata
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11559813
下载链接
链接失效反馈官方服务:
资源简介:
WikiDBs is an open-source corpus of 100,000 relational databases. We aim to support research on tabular representation learning on multi-table data. The corpus is based on Wikidata and aims to follow certain characteristics of real-world databases.
WikiDBs was published as a spotlight paper at the Dataset & Benchmarks track at NeurIPS 2024.
WikiDBs contains the database schemas, as well as table contents. The database tables are provided as CSV files, and each database schema as JSON. The 100,000 databases are available in five splits, containing 20k databases each. In total, around 165 GB of disk space are needed for the full corpus. We also provide a script to convert the databases into SQLite.
创建时间:
2024-12-12



