MIMIC-IT

arXiv2025-09-30 收录

下载链接：

https://github.com/luodian/otter

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含了2.8百万条指令与响应的配对，旨在提升视觉-语言模型（VLM）在实际场景中的表现。此外，该数据集还能增强VLM在感知、推理和规划方面的能力。规模达到了2.8百万条配对，任务类型为多模态指令遵循。

This dataset contains 2.8 million instruction-response pairs, aiming to improve the performance of Vision-Language Models (VLMs) in real-world scenarios. Additionally, this dataset can enhance the capabilities of VLMs in perception, reasoning and planning. With a scale of 2.8 million pairs, its task type is multimodal instruction following.

5,000+

优质数据集

54 个

任务类型

进入经典数据集