qinglinhou/CLEVR-MATE

Name: qinglinhou/CLEVR-MATE
Creator: qinglinhou
Published: 2026-03-17 04:30:34
License: 暂无描述

Hugging Face2026-03-17 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/qinglinhou/CLEVR-MATE

下载链接

链接失效反馈

官方服务：

资源简介：

--- configs: - config_name: 2d data_files: - split: train path: 2d/samples.jsonl - config_name: pyrender data_files: - split: train path: pyrender/samples.jsonl - config_name: blender data_files: - split: train path: blender/samples.jsonl - config_name: 2d_strict data_files: - split: train path: 2d_strict/samples.jsonl - config_name: pyrender_strict data_files: - split: train path: pyrender_strict/samples.jsonl - config_name: blender_strict data_files: - split: train path: blender_strict/samples.jsonl dataset_info: features: - name: id dtype: string - name: image_path dtype: string - name: scene dtype: string - name: question dtype: string - name: answer dtype: string - name: task dtype: string - name: object_count dtype: int32 - name: pointer_attribute dtype: string - name: target_attribute dtype: string --- # CLEVR-MATE MATE-like cross-modal entity linking dataset generated from CLEVR-style scenes for probe training. ## Variants | Config | Renderer | Scenes | Samples | Strict | Description | |--------|----------|--------|---------|--------|-------------| | 2d | Pillow | 3,000 | 18,000 | No | Fast 2D shape rendering | | pyrender | pyrender | 3,000 | 18,000 | No | Offscreen 3D rendering | | blender | Blender Cycles | 3,000 | 18,000 | No | Photorealistic ray-traced rendering | | 2d_strict | Pillow | 3,000 | 18,000 | Yes | 2D, strict cross-modal | | pyrender_strict | pyrender | 3,000 | 18,000 | Yes | 3D, strict cross-modal | | blender_strict | Blender Cycles | 3,000 | 18,000 | Yes | Photorealistic, strict cross-modal | ### MATE-consistent vs Strict - **MATE-consistent** (default): Matches MATE's convention — only the pointer/target visual attribute (color or shape) is stripped from the scene JSON. The other visual attribute remains, which may allow a "bridging shortcut" (e.g., if color is the pointer, shape is still in the JSON and visible in the image, so the model could match objects via shape alone without true cross-modal binding). - **Strict**: Both color AND shape are stripped from the scene JSON, forcing the model to rely on non-visual attributes (name, size, rotation, 3d_coords) for entity linking. This eliminates the bridging shortcut and tests true cross-modal binding ability. The strict variants share the same images as their non-strict counterparts — only the `samples.jsonl` (scene JSON filtering) differs. ## Usage ```python from datasets import load_dataset # Load a specific variant ds = load_dataset("qinglinhou/CLEVR-MATE", "blender") # Load the strict variant ds_strict = load_dataset("qinglinhou/CLEVR-MATE", "blender_strict") ``` ## Fields - **id**: unique hex identifier - **image_path**: relative path to the scene image (e.g. `images/scene_000042.png`) - **scene**: filtered scene JSON (Python repr format, visual attributes stripped) - **question**: cross-modal question (e.g. "What is the name of the yellow colored object?") - **answer**: gold answer - **task**: `img2data` or `data2img` - **object_count**: number of objects in the scene (3-8) - **pointer_attribute**: attribute used to identify the object - **target_attribute**: attribute to find ## Task Types 16 unique (task, pointer, target) combinations matching MATE cross_modal format: - **img2data**: pointer in {color, shape} -> target in {name, rotation, 3d_coords, size} - **data2img**: pointer in {name, 3d_coords, rotation, size} -> target in {color, shape}

提供机构：

qinglinhou

5,000+

优质数据集

54 个

任务类型

进入经典数据集