dinhanhx/VQAv2-vi
收藏Hugging Face2023-09-21 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/dinhanhx/VQAv2-vi
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
- vi
pretty_name: VQAv2 in Vietnamese
source-datasets:
- VQAv2
tags:
- VQAv2-vi
- VQA
license: unknown
task_categories:
- visual-question-answering
task_ids:
- visual-question-answering
---
# VQAv2 in Vietnamese
This is Google-translated version of [VQAv2](https://visualqa.org/) in Vietnamese. The process of building Vietnamese version as follows:
- In `en/` folder,
- Download `v2_OpenEnded_mscoco_train2014_questions.json` and `v2_mscoco_train2014_annotations.json` from [VQAv2](https://visualqa.org/).
- Remove key `answers` of key `annotations` from `v2_mscoco_train2014_annotations.json`. I shall use key `multiple_choice_answer` of key `annotations` only. Let call the new file `v2_OpenEnded_mscoco_train2014_answers.json`
- By using [set data structure](https://docs.python.org/3/tutorial/datastructures.html#sets), I generate `question_list.txt` and `answer_list.txt` of unique text. There are 152050 unique questions and 22531 unique answers from 443757 image-question-answer triplets.
- In `vi/` folder,
- By translating two `en/.txt` files, I generate `answer_list.jsonl` and `question_list.jsonl`. In each of entry of each file, the key is the original english text, the value is the translated text in vietnamese.
To load Vietnamese version in your code, you need original English version. Then just use English text as key to retrieve Vietnamese value from `answer_list.jsonl` and `question_list`. I provide both English and Vietnamese version.
Please refer to [this code](https://github.com/dinhanhx/velvet/blob/main/scripts/apply_translate_vqav2.py) to apply translation.
提供机构:
dinhanhx
原始信息汇总
数据集概述
数据集名称
VQAv2 in Vietnamese
语言
- 英语 (en)
- 越南语 (vi)
源数据集
- VQAv2
标签
- VQAv2-vi
- VQA
许可
未知
任务类别
- 视觉问答 (visual-question-answering)
任务ID
- visual-question-answering
数据集构建过程
-
英语部分 (
en/文件夹)- 下载
v2_OpenEnded_mscoco_train2014_questions.json和v2_mscoco_train2014_annotations.json。 - 从
v2_mscoco_train2014_annotations.json中移除annotations键下的answers键。仅使用annotations键下的multiple_choice_answer。新文件命名为v2_OpenEnded_mscoco_train2014_answers.json。 - 使用集合数据结构生成
question_list.txt和answer_list.txt,包含152050个唯一问题和22531个唯一答案,来自443757个图像-问题-答案三元组。
- 下载
-
越南语部分 (
vi/文件夹)- 通过翻译两个
en/.txt文件,生成answer_list.jsonl和question_list.jsonl。每个文件的每个条目中,键为原始英语文本,值为越南语翻译文本。
- 通过翻译两个
使用指南
- 在代码中加载越南语版本时,需要原始英语版本。使用英语文本作为键,从
answer_list.jsonl和question_list.jsonl中检索越南语值。 - 参考 此代码 应用翻译。



