five

WikiDBs - A Large-Scale Corpus Of Relational Databases From Wikidata

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11559813
下载链接
链接失效反馈
官方服务:
资源简介:
WikiDBs  is an open-source corpus of 100,000 relational databases. We aim to support research on tabular representation learning on multi-table data. The corpus is based on Wikidata and aims to follow certain characteristics of real-world databases. WikiDBs was published as a spotlight paper at the Dataset & Benchmarks track at NeurIPS 2024. WikiDBs contains the database schemas, as well as table contents. The database tables are provided as CSV files, and each database schema as JSON. The 100,000 databases are available in five splits, containing 20k databases each. In total, around 165 GB of disk space are needed for the full corpus. We also provide a script to convert the databases into SQLite.
创建时间:
2024-12-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作