AGQA-Decomp
收藏OpenDataLab2026-05-17 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/AGQA-Decomp
下载链接
链接失效反馈官方服务:
资源简介:
最近的视频问题回答基准表明,最先进的模型难以回答组成问题。但是,尚不清楚哪种类型的组成推理会导致模型错误预测。此外,很难辨别模型是使用组成推理还是利用数据偏差得出答案。在本文中,我们开发了一个问题分解引擎,该引擎以编程方式将组成问题解构为子问题的有向无环图。该图的设计使每个父问题都是其子级的组成。我们介绍了AGQA-decompp,这是一个包含2.3M个问题图的基准,每个图平均有11.49个子问题,总共有4.55M个新的子问题。使用问题图,我们使用一套新颖的成分一致性指标评估了三个最先进的模型。我们发现,模型要么无法通过大多数构图正确推理,要么依赖于不正确的推理来得出答案,经常自相矛盾,或者在中间推理步骤失败时获得高精度。
Recent video question answering benchmarks have demonstrated that state-of-the-art models struggle to answer compositional questions. However, it remains unclear which types of compositional reasoning lead models to make incorrect predictions. Furthermore, it is difficult to determine whether models arrive at answers by employing compositional reasoning or by exploiting dataset biases. In this work, we develop a question decomposition engine that programmatically decomposes compositional questions into directed acyclic graphs (DAGs) of sub-questions. Each parent question in these graphs is designed to be the compositional combination of its child sub-questions. We introduce AGQA-decompp, a benchmark containing 2.3 million question graphs, with an average of 11.49 sub-questions per graph and a total of 4.55 million new sub-questions. Using these question graphs, we evaluate three state-of-the-art models with a novel suite of compositional consistency metrics. We find that models either fail to reason correctly across most compositions, rely on faulty reasoning to arrive at answers (often leading to contradictions), or achieve high accuracy despite failing at intermediate reasoning steps.
提供机构:
OpenDataLab
创建时间:
2023-02-13
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



