UNK-VQA
收藏UNK-VQA: A Dataset and A Probe into Multi-modal Large Models Abstention Ability
数据集概述
UNK-VQA 是一个包含不可回答问题的视觉问答(VQA)数据集。
数据集结构
数据集的结构如下:
images-train:训练图像文件夹,包含 COCO 格式的图像文件(如COCO-*.jpg)。images-val:验证图像文件夹,包含 COCO 格式的图像文件(如COCO-*.jpg)。annt_train.json:训练集的标注文件。annt_val.json:验证集的标注文件。annt_test.json:测试集的标注文件。
扰动类型
数据集中包含五种扰动类型,具体如下:
T-1:单词替换T-2:语义否定I-1:图像替换I-2:图像掩码I-3:图像复制和移动
标注文件说明
每个 json 文件的结构相似,以下是一个标注文件的示例:
json
{
"answer_map": {
"1": "I dont know (e.g., beyond my knowledge)",
"2": "Not sure (e.g., multiple answers)",
"3": "I cannot answer (e.g., difficult question)"
},
"reason_map": {
"1": "It has multiple plausible answers",
"2": "It is difficult to understand",
"3": "The image lacks important concepts/information",
"4": "It requires higher-level knowledge to answer"
},
"alter_type_map": {
"T-1": "word replacement",
"T-2": "semantic negation",
"I-1": "image replacement",
"I-2": "image mask",
"I-3": "image copy and move"
},
"annotation": [
{
"question_id": 68248,
"question": "What is the man wearing on his lips?",
"image_name": "COCO_val2014_000000549683.jpg",
"answerability": {
"binary": true,
"other": {
"answer": "nothing",
"options": {
"orig": "glasses",
"baseline": "nothing",
"random": "camera"
}
}
},
"alter_type": "T-1",
"misc": {
"question_id_origin": 549683002,
"image_name_origin": "COCO_val2014_000000549683.jpg",
"answer_origin": "glasses"
}
}
]
}




