SEACrowd/maxm

Name: SEACrowd/maxm
Creator: SEACrowd
Published: 2024-06-24 13:22:46
License: 暂无描述

Hugging Face2024-06-24 更新2024-06-29 收录

下载链接：

https://hf-mirror.com/datasets/SEACrowd/maxm

下载链接

链接失效反馈

官方服务：

资源简介：

MaXM是一个仅用于测试的视觉问答（VQA）基准，涵盖7种不同的语言，包括泰语。该数据集是通过首先应用基于翻译的框架到mVQA，然后将其应用于Crossmodal-3600数据集中的多语言字幕生成的。

MaXM is a test-only Visual Question Answering (VQA) benchmark in 7 diverse languages, including Thai. The dataset is generated by first applying a translation-based framework to mVQA and then applying the framework to the multilingual captions in the Crossmodal-3600 dataset.

提供机构：

SEACrowd

原始信息汇总

MaXM 数据集概述

基本信息

名称: MaXM
语言: 泰语 (Thai)
任务类别: 视觉问答 (Visual Question Answering)
标签: 视觉问答 (Visual Question Answering)

数据集描述

MaXM 是一个仅用于测试的多语言视觉问答基准数据集，包含泰语在内的7种不同语言。该数据集通过翻译框架应用于 mVQA 数据集，并结合 Crossmodal-3600 数据集的多语言描述生成。

支持的任务

视觉问答 (Visual Question Answering)

数据集版本

源版本: 1.0.0
SEACrowd 版本: 2024.06.20

数据集许可证

许可证类型: 其他 (Other License)
使用条款: 数据集可自由用于任何目的，但建议注明 Google LLC 为数据源。数据集按“原样”提供，不提供任何明示或暗示的保证。Google 对因使用该数据集而导致的任何直接或间接损害不承担任何责任。

引用

如果使用 MaXM 数据集，请引用以下内容：

@inproceedings{changpinyo-etal-2023-maxm, title = "{M}a{XM}: Towards Multilingual Visual Question Answering", author = "Changpinyo, Soravit and Xue, Linting and Yarom, Michal and Thapliyal, Ashish and Szpektor, Idan and Amelot, Julien and Chen, Xi and Soricut, Radu", editor = "Bouamor, Houda and Pino, Juan and Bali, Kalika", booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023", month = dec, year = "2023", address = "Singapore", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.findings-emnlp.176", doi = "10.18653/v1/2023.findings-emnlp.176", pages = "2667--2682", abstract = "Visual Question Answering (VQA) has been primarily studied through the lens of the English language. Yet, tackling VQA in other languages in the same manner would require a considerable amount of resources. In this paper, we propose scalable solutions to multilingual visual question answering (mVQA), on both data and modeling fronts. We first propose a translation-based framework to mVQA data generation that requires much less human annotation efforts than the conventional approach of directly collection questions and answers. Then, we apply our framework to the multilingual captions in the Crossmodal-3600 dataset and develop an efficient annotation protocol to create MaXM, a test-only VQA benchmark in 7 diverse languages. Finally, we develop a simple, lightweight, and effective approach as well as benchmark state-of-the-art English and multilingual VQA models. We hope that our benchmark encourages further research on mVQA.", }

5,000+

优质数据集

54 个

任务类型

进入经典数据集