Visual Abductive Reasoning(VAR)
收藏OpenDataLab2026-05-24 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/Visual_Abductive_Reasoning_VAR
下载链接
链接失效反馈官方服务:
资源简介:
我们提出了一个新的任务和数据集,即视觉归纳推理 (VAR),用于检查日常视觉情况下机器智能的归纳推理能力。给定一组不完整的视觉事件,人工智能系统不仅需要描述观察到的内容,还需要推断出能够最好地解释视觉前提的假设。基于我们的大规模VAR数据集,我们设计了一个强大的基线模型,推理器 (因果级联推理变换器)。首先,为了捕获观测值的因果结构,在编码器中采用了上下文化的定向位置嵌入策略,该策略可以对前提和假设产生区分性表示。然后,将多个解码器级联以生成并逐步完善前提和假设句子。句子的预测得分用于在级联推理过程中指导跨句信息流。我们的VAR基准测试结果表明,推理器超越了许多著名的视频语言模型,但仍远远落后于人类的表现。预计这项工作将促进推理超越观察范式的未来努力。
We introduce a novel task and dataset, Visual Abductive Reasoning (VAR), for examining the abductive reasoning capabilities of machine intelligence in everyday visual scenarios. Given a set of incomplete visual events, an AI system is required to not only describe what has been observed, but also infer the hypothesis that best explains the visual premises. Based on our large-scale VAR dataset, we design a strong baseline model named Reasoner (Causal Cascaded Reasoning Transformer). First, to capture the causal structure of observations, a contextualized directional position embedding strategy is adopted in the encoder, which can generate discriminative representations for premises and hypotheses. Subsequently, multiple decoders are cascaded to generate and progressively refine the sentences of premises and hypotheses. The predicted scores of the sentences are used to guide cross-sentence information flow during the cascaded reasoning process. The results of our VAR benchmark tests show that the Reasoner outperforms many well-known video-language models, yet still lags far behind human performance. This work is expected to facilitate future efforts for reasoning beyond the observation paradigm.
提供机构:
OpenDataLab
创建时间:
2023-02-13
搜集汇总
数据集介绍

背景与挑战
背景概述
Visual Abductive Reasoning (VAR) 是一个多模态数据集,旨在评估机器在视觉事件中的归纳推理能力,由苏黎世联邦理工学院、悉尼科技大学和浙江大学于2022年发布。该数据集包含一个基线模型推理器,用于生成和优化解释视觉前提的假设,目前模型表现优于许多视频语言模型但尚未达到人类水平。
以上内容由遇见数据集搜集并总结生成



