Comparison of Model Structures.

Figshare2025-11-13 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/Comparison_of_Model_Structures_/30613862

下载链接

链接失效反馈

官方服务：

资源简介：

Visual question answering (VQA) as an interdisciplinary task of computer vision and natural language processing, estimating the model’s visual reasoning ability, which requires the integration of image information extraction technology and natural language understanding technology. The testing on professional benchmark which controls the potential bias states that the VQA method based on task decomposition is a promising approach, offering advantages in interpretability at program execution stage and reducing data bias dependencies, compared with traditional VQA methods that only rely on multimodal fusion. The VQA method based on task decomposition decomposes the task by parsing natural language and it usually parses the language with sequence-to-sequence networks. It has limitations when faced with flexible and varied natural language, making it difficult to accurately decompose the task. To address this issue, we propose a Graph-to-Sequence Task Decomposition Network (Graph2Seq-TDN), which uses semantic structural information from natural language to guide the task decomposition process and improve parsing accuracy, additionally, in terms of reasoning execution, in addition to the original symbolic reasoning execution, we propose a reasoning executor to enhance execution performance. We conducted validation on four datasets: CLEVR, CLEVR-Human, CLEVR-CoGenT and GQA. The experimental results showed that our model outperformed the comparative model in terms of answering accuracy, program accuracy, and training costs under the same accuracy.

创建时间：

2025-11-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集