dinhanhx/evjvqa

Name: dinhanhx/evjvqa
Creator: dinhanhx
Published: 2023-06-24 01:55:42
License: 暂无描述

Hugging Face2023-06-24 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/dinhanhx/evjvqa

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en - vi - ja pretty_name: EVJVQA - Multilingual Visual Question Answering source-datasets: - original tags: - evjvqa license: unknown task_categories: - visual-question-answering task_ids: - visual-question-answering --- # EVJVQA - Multilingual Visual Question Answering ## Abstract Visual Question Answering (VQA) is a challenging task of natural language processing (NLP) and computer vision (CV), attracting significant attention from researchers. English is a resource-rich language that has witnessed various developments in datasets and models for visual question answering. Visual question answering in other languages also would be developed for resources and models. In addition, there is no multilingual dataset targeting the visual content of a particular country with its own objects and cultural characteristics. To address the weakness, we provide the research community with a benchmark dataset named EVJVQA, including 33,000+ pairs of question-answer over three languages: Vietnamese, English, and Japanese, on approximately 5,000 images taken from Vietnam for evaluating multilingual VQA systems or models. EVJVQA is used as a benchmark dataset for the challenge of multilingual visual question answering at the 9th Workshop on Vietnamese Language and Speech Processing (VLSP 2022). This task attracted 62 participant teams from various universities and organizations. In this article, we present details of the organization of the challenge, an overview of the methods employed by shared-task participants, and the results. The highest performances are 0.4392 in F1-score and 0.4009 in BLUE on the private test set. The multilingual QA systems proposed by the top 2 teams use ViT for the pre-trained vision model and mT5 for the pre-trained language model, a powerful pre-trained language model based on the transformer architecture. EVJVQA is a challenging dataset that motivates NLP and CV researchers to further explore the multilingual models or systems for visual question answering systems. We released the challenge on the Codalab evaluation system for further research. ## Links - https://arxiv.org/abs/2302.11752 - https://codalab.lisn.upsaclay.fr/competitions/12274

提供机构：

dinhanhx

原始信息汇总

EVJVQA - Multilingual Visual Question Answering

概述

语言: 英语、越南语、日语
数据集名称: EVJVQA
数据集类型: 多语言视觉问答
数据量: 包含超过33,000对问题-答案，涵盖约5,000张越南图片
应用场景: 用于评估多语言视觉问答系统和模型
挑战活动: 在第9届越南语言和语音处理研讨会(VLSP 2022)上作为多语言视觉问答挑战的基准数据集
参与情况: 吸引了62个来自不同大学和组织的团队参与
性能指标: 最高性能为F1-score 0.4392和BLUE 0.4009
模型使用: 顶级团队使用ViT作为预训练视觉模型，mT5作为基于transformer架构的预训练语言模型

数据集详情

任务类别: 视觉问答
许可证: 未知
原始数据: 原始数据集
标签: evjvqa

5,000+

优质数据集

54 个

任务类型

进入经典数据集