MIMIC-IT
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/luodian/otter
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了2.8百万条指令与响应的配对,旨在提升视觉-语言模型(VLM)在实际场景中的表现。此外,该数据集还能增强VLM在感知、推理和规划方面的能力。规模达到了2.8百万条配对,任务类型为多模态指令遵循。
This dataset contains 2.8 million instruction-response pairs, aiming to improve the performance of Vision-Language Models (VLMs) in real-world scenarios. Additionally, this dataset can enhance the capabilities of VLMs in perception, reasoning and planning. With a scale of 2.8 million pairs, its task type is multimodal instruction following.



