bohlt/dagzoo-default-baseline-demo-20260407
收藏Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/bohlt/dagzoo-default-baseline-demo-20260407
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: Default Baseline Synthetic Tabular Corpus
tags:
- tabular
- synthetic
- dagzoo
license: apache-2.0
---
# Default Baseline Synthetic Tabular Corpus
This dataset repository contains synthetic tabular corpora generated with `dagzoo`.
Balanced mixed-type baseline with the default factorized complete-data feature prior plus latent-complete-data target-head semantics.
## What is included
- `generated/`: public parquet shards plus per-shard `dataset_catalog.ndjson` summaries
- `curated/`: optional accepted-only shards written later by `dagzoo filter`
- `handoff_manifest.json`: portable corpus identity and provenance metadata
## Corpus summary
- generated datasets: 2
- task mix: classification (2)
- train rows per dataset: 768
- test rows per dataset: 256
- features per dataset: 16 to 49
- classes per classification dataset: 4 to 5
- observed feature types: cat, num
## Generation provenance
- source family: `dagzoo.heterogeneous_scm`
- generate run id: `98276f918489ba560f149fa887820470`
- generated corpus id: `e8bc35defe4260a3118f3cee1ee08853`
- config reference: `recipe:default-baseline`
- target derivation: `tabiclv2_latent_node`
- missingness: `none`
## Publishing workflow
This repo is produced from a local handoff root with:
```bash
dagzoo generate --config recipe:<name> --num-datasets <n> --handoff-root handoffs/<run_name>
dagzoo publish hub --handoff-root handoffs/<run_name> --repo-id <namespace/name>
```
Only the public handoff artifacts are uploaded. Local `internal/` sidecars stay on disk for dagzoo tooling and are not published to the Hub.
## Links
- dagzoo repo: [github.com/bensonlee5/dagzoo](https://github.com/bensonlee5/dagzoo)
- dagzoo docs: [bensonlee5.github.io/dagzoo/docs/](https://bensonlee5.github.io/dagzoo/docs/)
## References
- Dagzoo packaged baseline recipe.
提供机构:
bohlt



