SoM Dataset
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/zzxslp/SoM-LLaVA
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了带有数字标签的图像,使得模型能够将视觉对象与标签关联起来,并以结构化的方式描述它们。此外,该数据集能够提升多模态大型语言模型(MLLMs)的视觉推理能力,并减少其产生的幻觉现象,同时允许通过文本标记有效地引用视觉对象。该数据集规模在1万至3万张带有标签的图像之间,其任务旨在为多模态大型语言模型(MLLMs)进行视觉定位与推理。
This dataset comprises images paired with numerical labels, enabling models to associate visual objects with their corresponding labels and describe them in a structured manner. Furthermore, this dataset can enhance the visual reasoning capabilities of Multimodal Large Language Models (MLLMs), mitigate the hallucinations they generate, and allow efficient referencing of visual objects via text tokens. The dataset contains between 10,000 and 30,000 labeled images, and its targeted task is designed for visual grounding and reasoning of Multimodal Large Language Models (MLLMs).
提供机构:
Authors of the paper



