Intel/SK-VQA
收藏Hugging Face2025-06-26 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/Intel/SK-VQA
下载链接
链接失效反馈官方服务:
资源简介:
SK-VQA是一个大规模的合成多模态数据集,包含超过200万个视觉问答对,每个问答对都与包含回答问题所需信息的上下文文档配对。该数据集旨在解决在上下文增强生成设置中训练和评估多模态大型语言模型(MLLMs)的关键需求,特别是针对检索增强生成(RAG)系统。它使MLLMs能够进行上下文推理,即模型学习将答案定位在提供的上下文文档和图像中。在SK-VQA上训练的模型显示出比在现有数据集上训练的模型具有更好的域外泛化性能。它还为评估最先进的模型在上下文增强视觉问答任务上提供了一个具有挑战性的基准。
SK-VQA is a large-scale synthetic multimodal dataset containing over 2 million visual question-answer pairs, each paired with context documents that contain the information needed to answer the questions. The dataset is designed to address the critical need for training and evaluating multimodal large language models (MLLMs) in context-augmented generation settings, particularly for retrieval-augmented generation (RAG) systems. It enables training MLLMs for contextual reasoning, where models learn to ground answers in provided context documents and images. Models trained on SK-VQA demonstrate superior out-of-domain generalization compared to those trained on existing datasets. It also provides a challenging benchmark for evaluating state-of-the-art models on context-augmented VQA tasks.
提供机构:
Intel



