five

klokedm/tabnetics-runs

收藏
Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/klokedm/tabnetics-runs
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - tabular-classification tags: - tabular - benchmark - feature-selection - classification - automl - tabnetics pretty_name: Tabnetics Validation Runs size_categories: - 100K<n<1M --- # Tabnetics Validation Runs Per-run, per-dataset, per-seed, per-method experimental results from the **tabnetics** automated tabular-classification pipeline across validation campaigns val-18 through val-21. | Statistic | Value | |-----------|-------| | Rows | 140,403 | | Columns | 176 | | Campaigns | val-18, val-19, val-20, val-21 | | Unique datasets | 63 | | Unique pipeline profiles | 278 | | Seeds | 11, 23, 37, 42, 59, 67, 73, 89, 97 | | Winner rows | 56,702 | | Classifier-candidate rows | 83,701 | ## Row types Every row represents one **(dataset, seed, classifier)** trial: - **`winner`** — the pipeline's final selected classifier for that (dataset, seed, profile) run. These rows carry full holdout metrics (accuracy, balanced_accuracy, macro_f1, hybrid_score, roc_auc, etc.) plus timing and feature-selection details from the CSV results. - **`classifier_candidate`** — a non-winning classifier from the model cross-validation stage. These rows carry the CV score (`model_cv_score`) and train-test gap (`model_cv_train_test_gap`) but *not* holdout metrics, since only the winner was evaluated on the held-out test set. ## Key columns ### Identity & metadata | Column | Description | |--------|-------------| | `campaign` | Validation campaign (val-18 through val-21) | | `profile` | Pipeline profile name | | `dataset_shard` | Dataset shard identifier (ds0, ds1, etc.) | | `run_timestamp` | Run timestamp | | `dataset_id` | OpenML / internal dataset identifier | | `dataset_name` | Human-readable dataset name | | `tier` / `effective_tier` | Dataset complexity tier | | `domain` | Dataset domain | | `seed` | Random seed | | `row_type` | `winner` or `classifier_candidate` | | `is_winner` | Boolean flag | ### Performance metrics (winner rows) | Column | Description | |--------|-------------| | `accuracy` | Holdout accuracy | | `balanced_accuracy` | Holdout balanced accuracy | | `macro_f1` | Holdout macro-F1 | | `hybrid_score` | Composite hybrid score | | `roc_auc` | ROC-AUC (various curve types) | ### Model selection (all rows) | Column | Description | |--------|-------------| | `model` | Classifier name | | `model_cv_score` | Cross-validation score during model selection | | `model_cv_train_test_gap` | CV train-test gap (overfitting indicator) | ### Feature selection | Column | Description | |--------|-------------| | `selection_strategy` | FS strategy used | | `n_features_selected` | Number of features after selection | | `n_portfolio_candidates` | Size of the FS portfolio | | `fs_method_preset` | FS method preset name | ### Timing | Column | Description | |--------|-------------| | `fs_time_sec` | Feature-selection wall time | | `dist_time_sec` | Distribution fitting wall time | | `classification_stage2_wall_sec` | Classification stage-2 wall time | ### Configuration flags (`cfg_*` columns) ~40 boolean/string configuration flags from each run's metadata, prefixed with `cfg_`. These capture the exact pipeline settings for reproducibility. The raw JSON is also available in `config_flags_json`. ## Usage ```python from datasets import load_dataset ds = load_dataset("klokedm/tabnetics-runs", split="train") df = ds.to_pandas() # Winners only winners = df[df["row_type"] == "winner"] # All classifier candidates for a specific dataset cands = df[(df["dataset_name"] == "wdbc") & (df["seed"] == 42)] # Compare campaigns df.groupby("campaign")["balanced_accuracy"].mean() ``` ## Source Built with [`scripts/build_hf_runs_dataset.py`](https://github.com/klokedm/tabnetics/blob/main/scripts/build_hf_runs_dataset.py). **Library:** [tabnetics on PyPI](https://pypi.org/project/tabnetics/) · [GitHub](https://github.com/klokedm/tabnetics) · [Documentation](https://tabnetics.org)
提供机构:
klokedm
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作