InfoVisDial
收藏arXiv2023-12-21 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2312.13503v1
下载链接
链接失效反馈官方服务:
资源简介:
InfoVisDial是由华盛顿大学和微软Azure AI联合创建的一个视觉对话数据集,旨在提供丰富的信息性答案,即使在涉及视觉内容的外部知识时也能保持信息的丰富性。该数据集通过结合大规模多模态模型(如GIT)和语言模型(如GPT-3)来有效收集数据,能够生成大规模的信息性视觉对话数据。数据集包含多轮对话,涉及场景文本、视觉组件、知识和自由形式的长期答案,旨在解决现有数据集中答案简短且信息量有限的问题。
InfoVisDial is a visual dialogue dataset jointly developed by the University of Washington and Microsoft Azure AI. It is designed to deliver rich informative answers, ensuring informational adequacy even when external knowledge associated with visual content is referenced. The dataset is efficiently collected by leveraging large-scale multimodal models (such as GIT) and language models (such as GPT-3), which allows for the generation of large-scale informative visual dialogue data. It encompasses multi-turn conversations involving scene text, visual components, domain knowledge, and free-form long-form answers, targeting the limitation of existing datasets that typically provide short, information-sparse answers.
提供机构:
华盛顿大学, 美国 微软Azure AI, 美国
创建时间:
2023-12-21



