Img-Diff
收藏魔搭社区2025-12-20 更新2024-08-31 收录
下载链接:
https://modelscope.cn/datasets/Data-Juicer/Img-Diff
下载链接
链接失效反馈官方服务:
资源简介:
# Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
我们发布了 **Img-Diff** 数据集,一份高质量的专注于描述图像之间的差异区域的合成数据集,以用于多模态大模型(MLLMs)的训练。更多信息请移步[论文](https://arxiv.org/abs/2408.04594)与[代码](https://github.com/modelscope/data-juicer/tree/ImgDiff).
> **Abstract:** High-performance Multimodal Large Language Models (MLLMs) rely heavily on data quality. This study introduces a novel dataset named Img-Diff, designed to enhance fine-grained image recognition in MLLMs by leveraging insights from contrastive learning and image difference captioning. By analyzing object differences between similar images, we challenge models to identify both matching and distinct components. We utilize the Stable-Diffusion-XL model and advanced image editing techniques to create pairs of similar images that highlight object replacements. Our methodology includes a Difference Area Generator for object differences identifying, followed by a Difference Captions Generator for detailed difference descriptions. The result is a relatively small but high-quality dataset of "object replacement" samples. We use the the proposed dataset to finetune state-of-the-art (SOTA) MLLMs such as MGM-7B, yielding comprehensive improvements of performance scores over SOTA models that trained with larger-scale datasets, in numerous image difference and Visual Question Answering tasks. For instance, our trained models notably surpass the SOTA models GPT-4V and Gemini on the MMVP benchmark. Besides, we investigate alternative methods for generating image difference data through "object removal" and conduct a thorough evaluation to confirm the dataset's diversity, quality, and robustness, presenting several insights on the synthesis of such a contrastive dataset. We release our codes and dataset, to encourage further research and advance the field of multimodal data synthesis and enhancement of MLLMs' fundamental capabilities for image understanding.

**Picture1**: Examples of Img-Diff Image Pairs: The top row shows "object replacement" image pairs, while the bottom row shows "object removal" image pairs.

**Picture2**: Illustration of the generation process for “object replacement” data within Img-Diff.
## 文件描述:
- img_diff_object_replacement.json为“物体替换”类图像对的标注信息;
- img_diff_object_removal.json为“物体抹除”类图像对的标注信息;
- object_replacement.zip为“物体替换”类图像对的图像压缩包;
- object_removal.zip为“物体抹除”类图像对的图像压缩包;
- object_replacement_no_bbox.zip为“物体替换”类图像对(图像中不画红框)的图像压缩包,提供给不需要高亮差异区域的任务使用。
## 引用:
如果您觉得我们的工作对您的研究有帮助,请考虑引用我们的论文:
```
@misc{jiao2024imgdiffcontrastivedatasynthesis,
title={Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models},
author={Qirui Jiao and Daoyuan Chen and Yilun Huang and Bolin Ding and Yaliang Li and Ying Shen},
year={2024},
eprint={2408.04594},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.04594},
}
```
感谢您对本项目的关注与喜爱!
# Img-Diff: 面向多模态大语言模型的对比数据合成
我们发布了 **Img-Diff** 数据集,这是一份高质量的专注于图像差异区域描述的合成数据集,用于多模态大语言模型(Multimodal Large Language Model, MLLM)的训练。更多信息请移步[论文](https://arxiv.org/abs/2408.04594)与[代码仓库](https://github.com/modelscope/data-juicer/tree/ImgDiff)。
> **摘要:** 高性能多模态大语言模型(Multimodal Large Language Model, MLLM)高度依赖数据质量。本研究提出了一款名为Img-Diff的新型数据集,借助对比学习与图像差异描述的相关思路,旨在增强MLLM的细粒度图像识别能力。通过分析相似图像间的目标差异,我们要求模型同时识别匹配与差异化的图像组件。我们采用Stable-Diffusion-XL模型与先进的图像编辑技术,生成能够凸显目标替换效果的相似图像对。我们的方法包含两个模块:差异区域生成器(用于识别目标差异)与差异描述生成器(用于生成详细的差异说明),最终得到了一个规模相对较小但质量优异的“目标替换”样本数据集。我们使用该数据集对MGM-7B等前沿最优(State-of-the-Art, SOTA)多模态大语言模型进行微调,结果显示,相较于使用更大规模数据集训练的SOTA模型,我们的方法在各类图像差异任务与视觉问答任务中均实现了性能指标的全面提升。例如,我们训练得到的模型在MMVP基准测试中显著优于GPT-4V与Gemini等SOTA模型。此外,我们还探索了通过“目标抹除”生成图像差异数据的替代方案,并开展了全面的评估以验证该数据集的多样性、质量与鲁棒性,同时针对此类对比数据集的合成方法提出了若干研究见解。我们开源了代码与数据集,以推动相关领域的研究进展,助力多模态数据合成技术的发展以及MLLM图像理解基础能力的提升。

**图1**:Img-Diff图像对示例:上排为“目标替换”型图像对,下排为“目标抹除”型图像对。

**图2**:Img-Diff中“目标替换”数据的生成流程示意图。
## 文件描述:
- `"img_diff_object_replacement.json"`:包含“目标替换”类图像对的标注信息;
- `"img_diff_object_removal.json"`:包含“目标抹除”类图像对的标注信息;
- `"object_replacement.zip"`:“目标替换”类图像对的图像压缩包;
- `"object_removal.zip"`:“目标抹除”类图像对的图像压缩包;
- `"object_replacement_no_bbox.zip"`:不含红框高亮差异区域的“目标替换”类图像对图像压缩包,供无需高亮差异区域的任务使用。
## 引用:
若您的研究工作从本项目中获益,请考虑引用我们的论文:
@misc{jiao2024imgdiffcontrastivedatasynthesis,
title={Img-Diff: 面向多模态大语言模型的对比数据合成},
author={Qirui Jiao and Daoyuan Chen and Yilun Huang and Bolin Ding and Yaliang Li and Ying Shen},
year={2024},
eprint={2408.04594},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.04594},
}
感谢您对本项目的关注与支持!
提供机构:
maas
创建时间:
2024-08-09
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



