five

kvignesh1420/plurel

收藏
Hugging Face2026-02-28 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/kvignesh1420/plurel
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - tabular-classification - tabular-regression - feature-extraction pretty_name: PluRel – Synthetic Relational Databases size_categories: - 1B<n<10B tags: - relational-data - synthetic-data - foundation-models - tabular - pretraining - structural-causal-model - relbench --- # PluRel Dataset **Synthetic Data unlocks Scaling Laws for Relational Foundation Models** [![arXiv](https://img.shields.io/badge/arXiv-2602.04029-b31b1b?style=flat&logo=arxiv)](https://arxiv.org/abs/2602.04029) [![Project Page](https://img.shields.io/badge/Project-Page-blue?style=flat&logo=github)](https://snap-stanford.github.io/plurel/) [![GitHub](https://img.shields.io/badge/Code-GitHub-black?style=flat&logo=github)](https://github.com/snap-stanford/plurel) [![Checkpoints](https://img.shields.io/badge/Checkpoints-HuggingFace-yellow?style=flat&logo=huggingface)](https://huggingface.co/kvignesh1420/relational-transformer-plurel) Preprocessed synthetic relational databases for pretraining relational foundation models, as introduced in: > **PluRel: Synthetic Data unlocks Scaling Laws for Relational Foundation Models** > Kothapalli, Ranjan, Hudovernik, Dwivedi, Hoffart, Guestrin, Leskovec — arXiv:2602.04029 (2026) --- ## Data Structure Each entry is a [relbench](https://github.com/snap-stanford/relbench)-compatible `Database` consisting of multiple relational tables. | Component | Description | |-----------|-------------| | Tables | 3–20 per database | | Primary keys | `row_idx` (auto-generated) | | Foreign keys | `foreign_row_0`, `foreign_row_1`, ... | | Feature columns | `feature_0`, `feature_1`, ... (categorical or numerical) | | Time column | `date` — on activity (leaf) tables only | **Schema topology** is sampled from: BarabasiAlbert, ReverseRandomTree, or WattsStrogatz graphs. **Data generation** uses Structural Causal Models (SCMs) — column dependencies are modeled as DAGs, with values propagated through randomly-initialized MLPs. Activity tables also include trend + cycle + noise time-series. | Parameter | Range | |-----------|-------| | Rows per entity table | 500–1,000 | | Rows per activity table | 2,000–5,000 | | Columns per table | 3–40 (power-law) | | Missing values | 1–10% of numerical columns | | Timestamp range | 1990–2025 | | Train / Val / Test | 80% / 10% / 10% | --- ## Download ```bash huggingface-cli download kvignesh1420/plurel \ --repo-type dataset \ --local-dir ~/scratch/pre ``` --- ## Usage Databases are named `rel-synthetic-<seed>` and are fully reproducible: ```python from plurel import SyntheticDataset, Config dataset = SyntheticDataset(seed=42, config=Config()) db = dataset.make_db() for name, table in db.tables.items(): print(f"{name}: {table.df.shape}") ``` See [snap-stanford/plurel](https://github.com/snap-stanford/plurel) for installation, configuration, and training scripts. --- ## Related | Resource | Link | |----------|------| | Pretrained checkpoints | [kvignesh1420/relational-transformer-plurel](https://huggingface.co/kvignesh1420/relational-transformer-plurel) | | Real-world relbench data | [hvag976/relational-transformer](https://huggingface.co/datasets/hvag976/relational-transformer) | --- ## Citation ```bibtex @article{kothapalli2026plurel, title={{PluRel:} Synthetic Data unlocks Scaling Laws for Relational Foundation Models}, author={Kothapalli, Vignesh and Ranjan, Rishabh and Hudovernik, Valter and Dwivedi, Vijay Prakash and Hoffart, Johannes and Guestrin, Carlos and Leskovec, Jure}, journal={arXiv preprint arXiv:2602.04029}, year={2026} } ```
提供机构:
kvignesh1420
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作