8Planetterraforming/Cube-Multi-Object-Consistency-Dataset
收藏Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/8Planetterraforming/Cube-Multi-Object-Consistency-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
---
# Cube Multi-Object Consistency Dataset
This project explores a structured visual reasoning problem: maintaining exact object count, geometry, indexing, and attribute consistency across a multi-object scene.
## Task Description
The reference scene consists of **26 isometric cubes** arranged in a strict layout:
- 6 cubes
- 6 cubes
- 6 cubes
- 6 cubes
- 2 cubes
Each cube must:
- preserve its position
- preserve spacing
- preserve geometry
- contain a unique index from **1 to 26**
- optionally include an additional attribute (e.g., letter or object like a planet)
---
## Goal
The goal is to evaluate whether generative models can:
- maintain exact object count
- preserve spatial structure
- correctly assign symbolic labels
- maintain consistency when adding per-object detail
---
## Key Observation
As visual and object-level complexity increases, model reliability decreases.
When the model generates:
- only cubes → mostly correct
- cubes + numbers → small errors
- cubes + numbers + letters → more errors
- cubes + numbers + letters + unique objects → frequent failures
This indicates that **multi-object consistency breaks as constraints increase**.
---
## Failure Modes Observed
Across generated images, the following errors were observed:
- duplicated numbers (e.g. repeated "2")
- missing numbers (e.g. no "22")
- incorrect ordering
- incorrect row structure
- merged or overlapping cubes
- broken spacing
- attribute mismatch (letter does not match number)
- inconsistent mapping between object and label
- hallucinated values (e.g. "29" instead of 24)
---
## Image Set (7 examples)
### 1. ✅ Correct Reference (Colab)
**`image_01_colab_reference.png`**
- generated programmatically
- perfect geometry
- correct layout: `6 / 6 / 6 / 6 / 2`
- correct numbering 1–26
- no duplicates, no missing values

This image is the **ground truth**.
---
### 2. ⚠️ ChatGPT Generated (from scratch, not editing reference)
**`image_02_chatgpt_generated.png`**

- visually similar style
- but incorrect:
- duplicated numbers
- missing numbers
- broken layout
Shows that **visual plausibility ≠ structural correctness**
---
### 3. ❌ Stylized Version (parchment attempt)
**`image_03_stylized_fail.png`**

- attempted aesthetic transformation
- structure not preserved
- numbering corrupted
Failure cause:
> model re-generated scene instead of preserving it
---
### 4. ❌ Multi-object (letters + numbers)
**`image_04_letters_fail.png`**

- added letter labels (A–Z)
- errors:
- mismatch between letters and numbers
- shifted assignments
Failure cause:
> increased symbolic complexity
---
### 5. ❌ Multi-object + visual attributes (planets)
**`image_05_planets_fail.png`**

- each cube assigned unique planet-like object
- errors increase significantly:
- incorrect numbering
- duplicated indices
- attribute mismatch
Failure cause:
> per-object visual uniqueness breaks global consistency
---
### 6. ❌ High-detail multi-object scene
**`image_06_complex_fail.png`**

- more detail per object
- increased variation
Observed:
- structural drift
- loss of alignment
- incorrect assignments
---
### 7. ❌ Extreme case (combined constraints)
**`image_07_extreme_fail.png`**

- multiple constraints combined:
- geometry
- numbering
- letters
- unique objects
Result:
- model fails across multiple dimensions simultaneously
---
## Key Insight
This dataset demonstrates a critical limitation:
> Models can produce visually convincing outputs while failing at exact structured reasoning.
As the number of simultaneous constraints increases, the probability of failure rises.
---
## Why This Matters
Most visual evaluations focus on realism.
However, real-world applications often require:
- exact counting
- exact indexing
- strict spatial consistency
- correct object-to-label mapping
This dataset exposes failures that are:
- subtle visually
- but critical logically
---
## Conclusion
The experiments show that:
- simple scenes → mostly correct
- structured scenes → partially correct
- multi-object structured scenes → unstable
This highlights the need for benchmarks that measure **precision, not just appearance**.
---
## Files
提供机构:
8Planetterraforming



