Tahsin-Mayeesha/vqa_bn
收藏Hugging Face2026-04-24 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Tahsin-Mayeesha/vqa_bn
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是VQA v2.0数据集的孟加拉语(Bengali)翻译版本,旨在支持孟加拉语视觉问题生成(VQG)的研究。数据集包含孟加拉语的问题和答案,以及与图像对齐的原始英语注释者答案。数据集构建自VQA v2.0数据集,包括来自82.8K图像的443.8K问题(训练集)和来自40.5K图像的214.4K问题(验证集)。由于计算限制,仅翻译了部分数据:220K训练QA对和150K验证QA对。数据集仅包含映射到16个预定义答案类别的样本,这些类别来源于覆盖约82% VQA v2.0的前500个答案。每个示例包含孟加拉语问题、孟加拉语答案(主要标签)、英语注释者答案列表、图像标识符、问题ID、答案类型和问题类别。数据集适用于孟加拉语视觉问题生成、跨语言视觉语言研究、低资源多模态学习和多语言VQA系统评估。
This dataset is a Bangla (Bengali) translation of the VQA v2.0 dataset, created to support research in Bengali Visual Question Generation (VQG). It contains Bangla questions and answers aligned with images, along with original English annotator answers from the VQA dataset. The dataset is constructed from the VQA v2.0 dataset, which includes 443.8K questions from 82.8K images (train) and 214.4K questions from 40.5K images (validation). Due to computational constraints, a subset was translated: 220K training QA pairs and 150K validation QA pairs. Only samples mapped to 16 predefined answer categories (derived from the top 500 answers covering ~82% of VQA v2.0) are included. Each example contains a Bangla question, Bangla answer (primary label), a list of English annotator answers, image identifier, question ID, answer type, and question category. The dataset is intended for Bengali Visual Question Generation, cross-lingual vision-language research, low-resource multimodal learning, and evaluation of multilingual VQA systems.
提供机构:
Tahsin-Mayeesha



