qinglinhou/CLEVR-MATE
收藏Hugging Face2026-03-17 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/qinglinhou/CLEVR-MATE
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: 2d
data_files:
- split: train
path: 2d/samples.jsonl
- config_name: pyrender
data_files:
- split: train
path: pyrender/samples.jsonl
- config_name: blender
data_files:
- split: train
path: blender/samples.jsonl
- config_name: 2d_strict
data_files:
- split: train
path: 2d_strict/samples.jsonl
- config_name: pyrender_strict
data_files:
- split: train
path: pyrender_strict/samples.jsonl
- config_name: blender_strict
data_files:
- split: train
path: blender_strict/samples.jsonl
dataset_info:
features:
- name: id
dtype: string
- name: image_path
dtype: string
- name: scene
dtype: string
- name: question
dtype: string
- name: answer
dtype: string
- name: task
dtype: string
- name: object_count
dtype: int32
- name: pointer_attribute
dtype: string
- name: target_attribute
dtype: string
---
# CLEVR-MATE
MATE-like cross-modal entity linking dataset generated from CLEVR-style scenes for probe training.
## Variants
| Config | Renderer | Scenes | Samples | Strict | Description |
|--------|----------|--------|---------|--------|-------------|
| 2d | Pillow | 3,000 | 18,000 | No | Fast 2D shape rendering |
| pyrender | pyrender | 3,000 | 18,000 | No | Offscreen 3D rendering |
| blender | Blender Cycles | 3,000 | 18,000 | No | Photorealistic ray-traced rendering |
| 2d_strict | Pillow | 3,000 | 18,000 | Yes | 2D, strict cross-modal |
| pyrender_strict | pyrender | 3,000 | 18,000 | Yes | 3D, strict cross-modal |
| blender_strict | Blender Cycles | 3,000 | 18,000 | Yes | Photorealistic, strict cross-modal |
### MATE-consistent vs Strict
- **MATE-consistent** (default): Matches MATE's convention — only the pointer/target visual attribute (color or shape) is stripped from the scene JSON. The other visual attribute remains, which may allow a "bridging shortcut" (e.g., if color is the pointer, shape is still in the JSON and visible in the image, so the model could match objects via shape alone without true cross-modal binding).
- **Strict**: Both color AND shape are stripped from the scene JSON, forcing the model to rely on non-visual attributes (name, size, rotation, 3d_coords) for entity linking. This eliminates the bridging shortcut and tests true cross-modal binding ability.
The strict variants share the same images as their non-strict counterparts — only the `samples.jsonl` (scene JSON filtering) differs.
## Usage
```python
from datasets import load_dataset
# Load a specific variant
ds = load_dataset("qinglinhou/CLEVR-MATE", "blender")
# Load the strict variant
ds_strict = load_dataset("qinglinhou/CLEVR-MATE", "blender_strict")
```
## Fields
- **id**: unique hex identifier
- **image_path**: relative path to the scene image (e.g. `images/scene_000042.png`)
- **scene**: filtered scene JSON (Python repr format, visual attributes stripped)
- **question**: cross-modal question (e.g. "What is the name of the yellow colored object?")
- **answer**: gold answer
- **task**: `img2data` or `data2img`
- **object_count**: number of objects in the scene (3-8)
- **pointer_attribute**: attribute used to identify the object
- **target_attribute**: attribute to find
## Task Types
16 unique (task, pointer, target) combinations matching MATE cross_modal format:
- **img2data**: pointer in {color, shape} -> target in {name, rotation, 3d_coords, size}
- **data2img**: pointer in {name, 3d_coords, rotation, size} -> target in {color, shape}
提供机构:
qinglinhou



