A4Bench

arXiv2025-09-30 收录

自然语言处理

模型评估

数据链接：

https://github.com/JunyingWang959/A4Bench/数据链接链接失效反馈

官方服务：

资源简介：

该数据集名为A4Bench，旨在评估多模态大型语言模型（MLLMs）在两个维度上的可供性感知能力：构成性可供性和转化性可供性。具体来说，该数据集包含1282个构成性可供性的问答对和718个转化性可供性的问答对。该数据集分为两部分：一是1282个构成性可供性的问答对，二是718个转化性可供性的问答对，旨在评估MLLMs在理解各种情境下可供性的能力。整个数据集共有2000个问答对，覆盖了多个学科领域，其任务是对MLLMs的可供性感知能力进行评估。

This dataset, named A4Bench, is designed to evaluate the affordance perception capabilities of Multimodal Large Language Models (MLLMs) across two dimensions: compositional affordance and transformative affordance. Specifically, it contains 1,282 question-answer pairs for compositional affordance and 718 question-answer pairs for transformative affordance. The dataset is divided into two subsets: the first subset holds the 1,282 compositional affordance question-answer pairs, while the second subset includes the 718 transformative affordance question-answer pairs, both intended to assess MLLMs’ ability to comprehend affordances across various scenarios. In total, the full dataset consists of 2,000 question-answer pairs spanning multiple academic disciplines, with the core objective of evaluating MLLMs’ affordance perception capabilities.

A4Bench

资源简介：

相关数据集