five

bohlt/dagzoo-default-baseline-demo-20260407

收藏
Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/bohlt/dagzoo-default-baseline-demo-20260407
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: Default Baseline Synthetic Tabular Corpus tags: - tabular - synthetic - dagzoo license: apache-2.0 --- # Default Baseline Synthetic Tabular Corpus This dataset repository contains synthetic tabular corpora generated with `dagzoo`. Balanced mixed-type baseline with the default factorized complete-data feature prior plus latent-complete-data target-head semantics. ## What is included - `generated/`: public parquet shards plus per-shard `dataset_catalog.ndjson` summaries - `curated/`: optional accepted-only shards written later by `dagzoo filter` - `handoff_manifest.json`: portable corpus identity and provenance metadata ## Corpus summary - generated datasets: 2 - task mix: classification (2) - train rows per dataset: 768 - test rows per dataset: 256 - features per dataset: 16 to 49 - classes per classification dataset: 4 to 5 - observed feature types: cat, num ## Generation provenance - source family: `dagzoo.heterogeneous_scm` - generate run id: `98276f918489ba560f149fa887820470` - generated corpus id: `e8bc35defe4260a3118f3cee1ee08853` - config reference: `recipe:default-baseline` - target derivation: `tabiclv2_latent_node` - missingness: `none` ## Publishing workflow This repo is produced from a local handoff root with: ```bash dagzoo generate --config recipe:<name> --num-datasets <n> --handoff-root handoffs/<run_name> dagzoo publish hub --handoff-root handoffs/<run_name> --repo-id <namespace/name> ``` Only the public handoff artifacts are uploaded. Local `internal/` sidecars stay on disk for dagzoo tooling and are not published to the Hub. ## Links - dagzoo repo: [github.com/bensonlee5/dagzoo](https://github.com/bensonlee5/dagzoo) - dagzoo docs: [bensonlee5.github.io/dagzoo/docs/](https://bensonlee5.github.io/dagzoo/docs/) ## References - Dagzoo packaged baseline recipe.
提供机构:
bohlt
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作