VASR
收藏OpenDataLab2026-05-17 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/VASR
下载链接
链接失效反馈官方服务:
资源简介:
人类认知的一个核心过程是类比映射: 识别不同情况之间相似关系结构的能力。我们介绍了一种新颖的任务,即情景识别的视觉类比,将经典的单词类比任务改编到视觉领域。给定图像的三重,任务是选择一个完成类比的图像候选B' (a到A' 就像B到什么?)。与以前专注于简单图像转换的视觉类比工作不同,我们解决了需要理解场景的复杂类比。
我们利用情况识别注释和剪辑模型来生成大量的500k个候选类比。数据样本的众包注释表明人类在80% 时间内同意数据集标签 (机会级别25%)。此外,我们使用人工注释来创建3,820验证的类比的黄金标准数据集。我们的实验表明,当随机选择干扰物 (〜86%) 时,最先进的模型效果很好,但与精心选择的干扰物 (〜53%,与90% 的人类准确性相比) 却很难。我们希望我们的数据集将鼓励开发新的类比模型。
A core cognitive process in humans is analogical mapping: the ability to recognize the relational structure of similarity across distinct scenarios. We introduce a novel task, visual analogy for situational recognition, which adapts the classic verbal analogy task to the visual domain. Given a triplet of images, the task is to select a candidate image B' that completes the analogy: just as a relates to A', what does B correspond to? Unlike previous visual analogy studies that focus on simple image transformations, we address complex analogies that require understanding of scene contexts.
We leverage situational recognition annotations and the CLIP model to generate a large set of 500,000 candidate analogies. Crowdsourced annotations of the data samples demonstrate that humans agree with the dataset labels 80% of the time, with a chance level of 25%. Additionally, we use manual annotations to construct a gold-standard dataset of 3,820 validated analogies. Our experiments show that state-of-the-art models perform well when distractors are randomly selected (achieving ~86% accuracy), but struggle with carefully curated distractors, only reaching ~53% accuracy compared to 90% human accuracy. We hope that our dataset will encourage the development of novel analogy models.
提供机构:
OpenDataLab
创建时间:
2023-02-06
搜集汇总
数据集介绍

背景与挑战
背景概述
VASR是一个视觉类比识别数据集,旨在通过图像三重任务(即给定A到A'的关系,推断B到B'的类比)来评估模型对复杂场景类比的理解能力。该数据集包含约500k个候选类比和3,820个经过人工验证的黄金标准样本,人类标注一致性达80%,但当前先进模型在精心设计的干扰物下准确率仅为53%,远低于人类的90%,突显了该任务对AI的挑战性。数据集由耶路撒冷希伯来大学于2022年发布,用于推动类比模型的发展。
以上内容由遇见数据集搜集并总结生成



