OpenGVLab/V2PE-Data
收藏Hugging Face2024-12-14 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/OpenGVLab/V2PE-Data
下载链接
链接失效反馈官方服务:
资源简介:
V2PE-Data数据集包含两个增强的长上下文多模态数据集:长视觉问答(Long-VQA)和长多模态检索(Long-MR)。这些数据集旨在增强视觉语言模型(VLMs)的长上下文训练,并建立一个系统的评估框架,以解决现有训练数据范围之外的长上下文理解挑战。Long-VQA数据集扩展了17个广泛采用的数据集,包含多达32K到64K的标记序列,涉及常识推理、事实知识和视觉信息解释等任务。Long-MR数据集则通过插入目标图像或文本段来评估VLMs从超长多模态序列中检索特定目标的能力。
The V2PE-Data dataset includes two augmented long-context multimodal datasets: Long Visual Question Answering (Long-VQA) and Long Multimodal Retrieval (Long-MR). Long-VQA extends 17 widely adopted datasets, containing 533K samples, to evaluate the capabilities of VLMs in understanding and reasoning over long multimodal sequences. Long-MR inserts target images or textual segments into sequences of interleaved images and texts, assessing the models ability to retrieve specific targets from ultra-long multimodal sequences, with two subsets: Long-MR-32K and Long-MR-256K.
提供机构:
OpenGVLab



