kvignesh1420/plurel
收藏Hugging Face2026-02-28 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/kvignesh1420/plurel
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- tabular-classification
- tabular-regression
- feature-extraction
pretty_name: PluRel – Synthetic Relational Databases
size_categories:
- 1B<n<10B
tags:
- relational-data
- synthetic-data
- foundation-models
- tabular
- pretraining
- structural-causal-model
- relbench
---
# PluRel Dataset
**Synthetic Data unlocks Scaling Laws for Relational Foundation Models**
[](https://arxiv.org/abs/2602.04029)
[](https://snap-stanford.github.io/plurel/)
[](https://github.com/snap-stanford/plurel)
[](https://huggingface.co/kvignesh1420/relational-transformer-plurel)
Preprocessed synthetic relational databases for pretraining relational foundation models, as introduced in:
> **PluRel: Synthetic Data unlocks Scaling Laws for Relational Foundation Models**
> Kothapalli, Ranjan, Hudovernik, Dwivedi, Hoffart, Guestrin, Leskovec — arXiv:2602.04029 (2026)
---
## Data Structure
Each entry is a [relbench](https://github.com/snap-stanford/relbench)-compatible `Database` consisting of multiple relational tables.
| Component | Description |
|-----------|-------------|
| Tables | 3–20 per database |
| Primary keys | `row_idx` (auto-generated) |
| Foreign keys | `foreign_row_0`, `foreign_row_1`, ... |
| Feature columns | `feature_0`, `feature_1`, ... (categorical or numerical) |
| Time column | `date` — on activity (leaf) tables only |
**Schema topology** is sampled from: BarabasiAlbert, ReverseRandomTree, or WattsStrogatz graphs.
**Data generation** uses Structural Causal Models (SCMs) — column dependencies are modeled as DAGs, with values propagated through randomly-initialized MLPs. Activity tables also include trend + cycle + noise time-series.
| Parameter | Range |
|-----------|-------|
| Rows per entity table | 500–1,000 |
| Rows per activity table | 2,000–5,000 |
| Columns per table | 3–40 (power-law) |
| Missing values | 1–10% of numerical columns |
| Timestamp range | 1990–2025 |
| Train / Val / Test | 80% / 10% / 10% |
---
## Download
```bash
huggingface-cli download kvignesh1420/plurel \
--repo-type dataset \
--local-dir ~/scratch/pre
```
---
## Usage
Databases are named `rel-synthetic-<seed>` and are fully reproducible:
```python
from plurel import SyntheticDataset, Config
dataset = SyntheticDataset(seed=42, config=Config())
db = dataset.make_db()
for name, table in db.tables.items():
print(f"{name}: {table.df.shape}")
```
See [snap-stanford/plurel](https://github.com/snap-stanford/plurel) for installation, configuration, and training scripts.
---
## Related
| Resource | Link |
|----------|------|
| Pretrained checkpoints | [kvignesh1420/relational-transformer-plurel](https://huggingface.co/kvignesh1420/relational-transformer-plurel) |
| Real-world relbench data | [hvag976/relational-transformer](https://huggingface.co/datasets/hvag976/relational-transformer) |
---
## Citation
```bibtex
@article{kothapalli2026plurel,
title={{PluRel:} Synthetic Data unlocks Scaling Laws for Relational Foundation Models},
author={Kothapalli, Vignesh and Ranjan, Rishabh and Hudovernik, Valter and Dwivedi, Vijay Prakash and Hoffart, Johannes and Guestrin, Carlos and Leskovec, Jure},
journal={arXiv preprint arXiv:2602.04029},
year={2026}
}
```
提供机构:
kvignesh1420



