VisDial (Visual Dialog)
收藏OpenDataLab2026-05-17 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/VisDial
下载链接
链接失效反馈官方服务:
资源简介:
Visual Dialog (VisDial) 数据集包含基于 MS COCO 数据集图像的人工注释问题。该数据集是通过配对 Amazon Mechanical Turk 上的两个主题来讨论图像而开发的。一个人被分配了“提问者”的工作,另一个人担任“回答者”。提问者只能看到图像的文本描述(即来自 MS COCO 数据集的图像标题),而原始图像对提问者仍然是隐藏的。他们的任务是询问关于这个隐藏图像的问题,以“更好地想象场景”。回答者看到图像、标题并回答提问者提出的问题。他们两人最多可以通过提问和回答问题来继续对话10轮。
VisDial v1.0 在 MS COCO(2017 训练集)上包含 123K 对话,用于训练拆分,2K 对话与验证图像用于验证拆分,以及 8K 对话在测试集上用于测试标准集。之前发布的 v0.5 和 v0.9 版本的 VisDial 数据集(对应于 MS COCO 的旧拆分)被认为已弃用。
The Visual Dialog (VisDial) dataset comprises manually annotated questions grounded in images from the MS COCO dataset. It was developed by pairing two workers on Amazon Mechanical Turk to collaborate on discussing a single shared image. One participant took on the role of "questioner", while the other served as the "answerer". The questioner was only provided with the textual description of the target image, namely the image captions from the MS COCO dataset, while the original raw image was hidden from their view. Their objective was to ask questions about the hidden image to "better imagine the scene". The answerer, by contrast, had full access to both the image and its accompanying caption, and was tasked with responding to the questions posed by the questioner. The two parties could extend their dialogue through alternating rounds of question-asking and answering for a maximum of 10 total turns.
For VisDial v1.0, the training split consists of 123K dialogues derived from the MS COCO 2017 training set for model training, the validation split includes 2K dialogues paired with MS COCO validation images for model validation, and the standard test split contains 8K dialogues from the MS COCO test set for standard evaluation. The previously released v0.5 and v0.9 versions of the VisDial dataset, which correspond to older MS COCO dataset splits, are now considered deprecated.
提供机构:
OpenDataLab
创建时间:
2022-06-07
搜集汇总
数据集介绍

背景与挑战
背景概述
VisDial (Visual Dialog) 是一个用于视觉对话任务的数据集,基于MS COCO图像构建,包含人工注释的多轮对话(最多10轮),通过提问者和回答者角色模拟真实交互。该数据集规模较大,提供123K对话用于训练,2K用于验证,8K用于测试,适用于文本预训练、自然语言处理和视觉问答研究,发布于2017年,遵循CC BY 4.0许可证。
以上内容由遇见数据集搜集并总结生成



