array/cola

Name: array/cola
Creator: array
Published: 2026-04-14 18:56:34
License: 暂无描述

Hugging Face2026-04-14 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/array/cola

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - image-classification - image-to-text - zero-shot-image-classification language: - en pretty_name: COLA size_categories: - 10K<n<100K tags: - compositionality - vision-language - visual-genome - clevr - paco configs: - config_name: multiobjects data_files: - split: val path: data/multiobjects.parquet - config_name: singleobjects_gqa data_files: - split: val path: data/singleobjects_gqa.parquet - config_name: singleobjects_clevr data_files: - split: val path: data/singleobjects_clevr.parquet - config_name: singleobjects_paco data_files: - split: val path: data/singleobjects_paco.parquet --- # COLA: Compose Objects Localized with Attributes Self-contained Hugging Face port of the **COLA** benchmark from the paper ["How to adapt vision-language models to Compose Objects Localized with Attributes?"](https://arxiv.org/abs/2305.03689). - 📄 Paper: https://arxiv.org/abs/2305.03689 - 🌐 Project page: https://cs-people.bu.edu/array/research/cola/ - 💻 Original code & data: https://github.com/ArijitRay1993/COLA This repository bundles the benchmark annotations as Parquet files and the referenced images as regular files under `images/`, so the dataset is fully self-contained. ## Dataset Structure ``` . ├── data/ │ ├── multiobjects.parquet │ ├── singleobjects_gqa.parquet │ ├── singleobjects_clevr.parquet │ ├── singleobjects_paco.parquet │ ├── singleobjects_gqa_labels.json │ ├── singleobjects_clevr_labels.json │ └── singleobjects_paco_labels.json └── images/ ├── vg/<vg_id>.jpg # Visual Genome images (multiobjects + GQA) ├── clevr/valA/*.png # CLEVR-CoGenT valA ├── clevr/valB/*.png # CLEVR-CoGenT valB ├── coco/val2017/*.jpg # COCO val2017 (PACO) └── coco/train2017/*.jpg # COCO train2017 (PACO) ``` Image paths stored in parquet are **relative to the repository root**, e.g. `images/vg/2390970.jpg`. Load them by joining with the local clone / snapshot path. ## Configs / Splits ### `multiobjects` (210 pairs) A hard image–caption matching task. Each row contains two images and two captions whose objects/attributes are swapped: caption 1 applies to image 1 (not image 2) and vice versa. | Field | Type | Description | |------------|--------|-----------------------------------| | `image1` | string | Relative path to image 1 | | `caption1` | string | Caption describing image 1 | | `image2` | string | Relative path to image 2 | | `caption2` | string | Caption describing image 2 | ### `singleobjects_gqa` (2,589 rows), `singleobjects_clevr` (30,000 rows), `singleobjects_paco` (7,921 rows) Multi-label classification across fixed vocabularies of multi-attribute object classes (320 for GQA, 96 for CLEVR, 400 for PACO). The label lists live at `data/singleobjects_<subset>_labels.json`. | Field | Type | Description | |----------------------|-----------------|---------------------------------------------------------------| | `image` | string | Relative path to the image | | `objects_attributes` | string (JSON) | Objects + attributes annotation (GQA and CLEVR only) | | `label` | list\[int] | Binary indicator per class (length matches labels vocabulary) | | `hard_list` | list\[int] | Indicator of whether each class is "hard" for this image | For a given class, the paper's MAP metric is computed on images where `hard_list == 1` for that class. See `scripts/eval.py` in the [original repo](https://github.com/ArijitRay1993/COLA) for the exact metric. ## Loading ```python from datasets import load_dataset mo = load_dataset("array/cola", "multiobjects", split="val") gqa = load_dataset("array/cola", "singleobjects_gqa", split="val") clv = load_dataset("array/cola", "singleobjects_clevr", split="val") paco = load_dataset("array/cola", "singleobjects_paco", split="val") ``` To open an image, resolve it against the local snapshot root: ```python from huggingface_hub import snapshot_download from PIL import Image import os root = snapshot_download("array/cola", repo_type="dataset") ex = mo[0] img1 = Image.open(os.path.join(root, ex["image1"])) img2 = Image.open(os.path.join(root, ex["image2"])) ``` Or, if you've cloned the repo with `git lfs`, just open paths directly: ```python Image.open(f"{REPO_DIR}/{ex['image1']}") ``` ## Licensing / Source notes - Visual Genome, CLEVR-CoGenT, and COCO images are redistributed here under their respective original licenses. Please refer to the upstream datasets: - [Visual Genome](https://visualgenome.org/) (CC BY 4.0) - [CLEVR-CoGenT](https://cs.stanford.edu/people/jcjohns/clevr/) (CC BY 4.0) - [COCO 2017](https://cocodataset.org/) (CC BY 4.0 for annotations; Flickr terms for images) - The COLA annotations (parquet files and label lists) are released under the MIT license, matching the [original COLA repo](https://github.com/ArijitRay1993/COLA). ## Citation ```bibtex @article{ray2023cola, title = {COLA: How to adapt vision-language models to Compose Objects Localized with Attributes?}, author = {Ray, Arijit and Radenovic, Filip and Dubey, Abhimanyu and Plummer, Bryan A. and Krishna, Ranjay and Saenko, Kate}, journal = {arXiv preprint arXiv:2305.03689}, year = {2023} } ```

提供机构：

array

5,000+

优质数据集

54 个

任务类型

进入经典数据集