TYTSTQ/ordinary-bench-subset-ablation

Name: TYTSTQ/ordinary-bench-subset-ablation
Creator: TYTSTQ
Published: 2026-04-01 18:58:14
License: 暂无描述

Hugging Face2026-04-01 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/TYTSTQ/ordinary-bench-subset-ablation

下载链接

链接失效反馈

官方服务：

资源简介：

--- configs: - config_name: default default: true data_files: - split: train path: data/default/train*.parquet task_categories: - visual-question-answering language: - en license: mit tags: - spatial-reasoning - vlm-benchmark - ordinal-relations - 3d-scenes - multi-view - ablation-study - subset-sensitivity size_categories: - 100K<n<1M --- # ORDINARY-BENCH Subset Ablation Dataset An ablation dataset testing whether VLMs are affected by **irrelevant objects** in the scene. For each parent scene (6-10 objects), all C(N,4) four-object subsets are re-rendered, and the **full QRR question bank** is asked — including questions about objects NOT present in the subset image. > Main benchmark: [TYTSTQ/ordinary-bench](https://huggingface.co/datasets/TYTSTQ/ordinary-bench) > > Multi-view version: [TYTSTQ/ordinary-bench-multiview](https://huggingface.co/datasets/TYTSTQ/ordinary-bench-multiview) > > Source code: [GitHub - tasd12-ty/ordinary-bench-core](https://github.com/tasd12-ty/ordinary-bench-core) ## Overview | | | |---|---| | Parent scenes | 10 (n06-n10, 2 per complexity level) | | Subsets | 912 (all C(N,4) combinations) | | Questions per subset | Full master QRR bank from parent scene | | Total questions | 624,963 | | Answerable | 13,543 (2.2%) — all 4 referenced objects present | | N/A (refusal expected) | 611,420 (97.8%) — ≥1 referenced object missing | | Images per subset | 5 (1 single-view + 4 multi-view) | ## Experimental Design 1. **Parent scenes**: 10 test scenes with 6-10 objects each 2. **Subset enumeration**: All C(N,4) four-object subsets (same positions, camera unchanged) 3. **Re-rendering**: Each subset is rendered with only 4 objects (Blender, same camera angles) 4. **Master QRR bank**: All pairwise distance comparisons from the parent scene (disjoint + shared_anchor + FDR decomposition) 5. **Question assignment**: Each question is labeled `answerable` (all referenced objects present) or N/A (≥1 missing) ### Key insight When aggregating across subsets of the same parent scene, N/A answers can be **ignored** — only answerable predictions matter. This enables cross-subset consistency analysis. ## Quick Start ```python from datasets import load_dataset ds = load_dataset("TYTSTQ/ordinary-bench-subset-ablation", split="train") sample = ds[0] sample["single_view"] # PIL Image (single-view, 480x320) sample["view_0"] # PIL Image (multi-view camera 0) sample["view_1"] # PIL Image (multi-view camera 1) sample["view_2"] # PIL Image (multi-view camera 2) sample["view_3"] # PIL Image (multi-view camera 3) sample["answerable"] # True/False sample["missing_objects"] # JSON: [] or ["obj_5", "obj_7"] sample["gt_comparator"] # "<", "~=", or ">" sample["parent_scene_id"] # "n10_000082" # Filter to answerable questions only answerable = ds.filter(lambda x: x["answerable"]) # Filter by parent scene parent_subset = ds.filter(lambda x: x["parent_scene_id"] == "n10_000082") ``` ## Column Schema | Column | Type | Description | |--------|------|-------------| | `scene_id` | string | Subset scene ID, e.g., `n10_000082__s0042` | | `parent_scene_id` | string | Parent scene ID, e.g., `n10_000082` | | `n_objects` | int | Objects in subset (always 4) | | `n_objects_parent` | int | Objects in parent scene (6-10) | | `single_view` | Image | Single-view render (480x320 PNG) | | `view_0` | Image | Multi-view camera 0 (az=45°) | | `view_1` | Image | Multi-view camera 1 (az=135°) | | `view_2` | Image | Multi-view camera 2 (az=225°) | | `view_3` | Image | Multi-view camera 3 (az=315°) | | `objects` | string | JSON: objects visible in this subset | | `all_objects_in_parent` | string | JSON: all objects in parent scene | | `qid` | string | Question ID from master bank, e.g., `mqrr_0001` | | `question_type` | string | Always `qrr` | | `variant` | string | `disjoint` or `shared_anchor` | | `answerable` | bool | True if all referenced objects are present | | `missing_objects` | string | JSON list of object IDs not in subset | | `gt_comparator` | string | Ground truth: `<`, `~=`, or `>` | | `pair1` | string | JSON: first object pair | | `pair2` | string | JSON: second object pair | | `anchor` | string | Anchor object (shared_anchor variant) | | `source` | string | `enumerate_qrr` or `fdr_decomposition` | | `source_fdr_qid` | string | Original FDR question ID (if decomposed) | ## Parent Scenes | Parent | N objects | C(N,4) subsets | Master QRR questions | |--------|-----------|----------------|---------------------| | n06_000080 | 6 | 15 | — | | n06_000083 | 6 | 15 | — | | n07_000087 | 7 | 35 | — | | n07_000088 | 7 | 35 | — | | n08_000084 | 8 | 70 | — | | n08_000087 | 8 | 70 | — | | n09_000083 | 9 | 126 | — | | n09_000097 | 9 | 126 | — | | n10_000082 | 10 | 210 | — | | n10_000098 | 10 | 210 | — | | **Total** | | **912** | **624,963** | ## Source Code The full subset ablation pipeline is at `experiments/subset_ablation/` in the source repo: | Script | Purpose | |--------|---------| | `enumerate_subsets.py` | C(N,4) subset enumeration | | `render_subsets.py` | Batch rendering (single + multi-view) | | `generate_master_questions.py` | Full QRR bank (incl. FDR decomposition) | | `assign_subset_questions.py` | Per-subset question assignment + N/A labels | | `run_subset_eval.py` | VLM evaluation with N/A support | | `analyze_results.py` | Cross-subset consistency analysis | ## License MIT

提供机构：

TYTSTQ

5,000+

优质数据集

54 个

任务类型

进入经典数据集