colorswap
收藏魔搭社区2025-11-27 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/stanfordnlp/colorswap
下载链接
链接失效反馈官方服务:
资源简介:
# ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation
## Dataset Description
ColorSwap is a dataset designed to assess and improve the proficiency of multimodal models in matching objects with their colors. The dataset is comprised of 2,000 unique image-caption pairs, grouped into 1,000 examples. Each example includes a caption-image pair, along with a "color-swapped" pair. Crucially, the two captions in an example have the same words, but the color words have been rearranged to modify different objects. The dataset was created through a novel blend of automated caption and image generation with humans in the loop.
Paper: Coming soon!
## Usage
You can download the dataset directly from the Hugging Face API with the following code:
```python
from datasets import load_dataset
dataset = load_dataset("stanfordnlp/colorswap", use_auth_token=True)
```
Please make sure to install the `datasets` library and use the `use_auth_token` parameter to authenticate with the Hugging Face API.
An example of the dataset is as follows:
```python
[
{
'id': 0,
'image_1': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=1024x1024 at 0x14D908B20>,
'image_2': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=1024x1024 at 0x14D9DCE20>,
'caption_1': 'someone holding a yellow umbrella wearing a white dress',
'caption_2': 'someone holding a white umbrella wearing a yellow dress',
'image_source': 'midjourney',
'caption_source': 'human'
}
...
]
```
## Evaluations
[This Google Colab](https://colab.research.google.com/drive/1EWPsSklfq49WiX2nUyOTmKZftU0AC4YL?usp=sharing) showcases our ITM model evaluations.
Please refer to our Github repository for the VLM evaluations: [ColorSwap](https://github.com/Top34051/colorswap).
## Citation
If you find our work useful, please cite the following paper:
```
@article{burapacheep2024colorswap,
author = {Jirayu Burapacheep and Ishan Gaur and Agam Bhatia and Tristan Thrush},
title = {ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation},
journal = {arXiv},
year = {2024},
}
```
# ColorSwap:面向多模态评测的颜色与词序数据集
## 数据集概述
ColorSwap是一款专为评估与提升多模态模型(multimodal model)的物体-颜色匹配能力而构建的数据集。该数据集包含2000组唯一的图像-标题对,被划分为1000个示例样本。每个示例均包含一组标准的图像-标题对,以及一组「颜色交换」变体对。尤为关键的是,单个示例内的两条标题所用词汇完全一致,仅对颜色词进行了重排,以指向不同的物体。本数据集通过自动化标题生成、自动化图像生成结合人机在环(humans in the loop)的创新方式构建。
论文:即将上线!
## 使用方法
可通过以下代码直接从Hugging Face API下载本数据集:
python
from datasets import load_dataset
dataset = load_dataset("stanfordnlp/colorswap", use_auth_token=True)
请确保已安装`datasets`库,并使用`use_auth_token`参数完成Hugging Face API的身份验证。
数据集示例如下:
python
[
{
'id': 0,
'image_1': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=1024x1024 at 0x14D908B20>,
'image_2': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=1024x1024 at 0x14D9DCE20>,
'caption_1': 'someone holding a yellow umbrella wearing a white dress',
'caption_2': 'someone holding a white umbrella wearing a yellow dress',
'image_source': 'midjourney',
'caption_source': 'human'
}
...
]
## 评测方案
[本Google Colab链接](https://colab.research.google.com/drive/1EWPsSklfq49WiX2nUyOTmKZftU0AC4YL?usp=sharing)展示了我们针对图像-文本匹配(Image-Text Matching, ITM)模型的评测流程。如需了解针对视觉语言模型(Vision-Language Model, VLM)的评测方法,请参考我们的GitHub仓库:[ColorSwap](https://github.com/Top34051/colorswap)。
## 引用方式
若您认为本工作对您有所帮助,请引用以下论文:
@article{burapacheep2024colorswap,
author = {Jirayu Burapacheep and Ishan Gaur and Agam Bhatia and Tristan Thrush},
title = {ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation},
journal = {arXiv},
year = {2024},
}
提供机构:
maas
创建时间:
2025-10-04



