DreamOmni2Bench
收藏魔搭社区2025-12-05 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/DreamOmni2Bench
下载链接
链接失效反馈官方服务:
资源简介:
# DreamOmni2: Multimodal Instruction-based Editing and Generation Benchmark
This repository contains the `DreamOmni2Bench` benchmark dataset, introduced in the paper [DreamOmni2: Multimodal Instruction-based Editing and Generation](https://huggingface.co/papers/2510.06679).
The DreamOmni2 project proposes two novel tasks: multimodal instruction-based editing and generation. These tasks support both text and image instructions and extend the scope to include both concrete and abstract concepts, greatly enhancing their practical applications. This benchmark is designed to drive the development of models capable of these new multimodal generation and editing tasks.
* **Project Page**: [https://pbihao.github.io/projects/DreamOmni2/index.html](https://pbihao.github.io/projects/DreamOmni2/index.html)
* **Code**: [https://github.com/dvlab-research/DreamOmni2](https://github.com/dvlab-research/DreamOmni2)
<p align="center">
<img width="600" src="https://github.com/dvlab-research/DreamOmni2/blob/main/imgs/gallery.png">
</p>
## Introduction
DreamOmni2 addresses the limitations of existing instruction-based image editing and subject-driven generation by proposing two novel tasks: multimodal instruction-based editing and generation. These tasks support both text and image instructions and extend the scope to include both concrete and abstract concepts, greatly enhancing their practical applications.
**(1) Multimodal Instruction-based Generation**
For traditional subject-driven generation based on concrete objects, DreamOmni2 achieves the best results among open-source models, showing superior identity and pose consistency. Additionally, DreamOmni2 can reference abstract attributes (such as material, texture, makeup, hairstyle, posture, design style, artistic style, etc.), even surpassing commercial models in this area.
**(2) Multimodal Instruction-based Editing**
Beyond traditional instruction-based editing models, DreamOmni2 supports multimodal instruction editing. In everyday editing tasks, there are often elements that are difficult to describe purely with language and require reference images. Our model addresses this need, supporting references to any concrete objects and abstract attributes, with performance comparable to commercial models.
**(3) Unified Generation and Editing Model**
Building upon these two new tasks, we introduce DreamOmni2, which is capable of multimodal instruction-based editing and generation under any concrete or abstract concept guidance. Overall, DreamOmni2 is a more intelligent and powerful open-sourced unified generation and editing model, offering enhanced capabilities across a wide range of tasks.
## Quick Start (Sample Usage)
### Requirements and Installation
First, install the necessary dependencies by cloning the `DreamOmni2` repository and installing its requirements:
```bash
git clone https://github.com/dvlab-research/DreamOmni2
cd ./DreamOmni2
pip install -r requirements.txt
```
Next, download the DreamOmni2 weights into the `models` folder:
```bash
huggingface-cli download --resume-download --local-dir-use-symlinks False xiabs/DreamOmni2 --local-dir ./models
```
### Inference
#### Multimodal Instruction-based Editing
**Notably, for editing tasks, due to the format settings of the training data, we need to place the image to be edited in the first position.**
```bash
python3 inference_edit.py \
--input_img_path "example_input/edit_tests/src.jpg" "example_input/edit_tests/ref.jpg" \
--input_instruction "Make the woman from the second image stand on the road in the first image." \
--output_path "example_input/edit_tests/edit_res.png"
```
#### Multimodal Instruction-based Generation
```bash
python3 inference_gen.py \
--input_img_path "example_input/gen_tests/img1.jpg" "example_input/gen_tests/img2.jpg" \
--input_instruction "In the scene, the character from the first image stands on the left, and the character from the second image stands on the right. They are shaking hands against the backdrop of a spaceship interior." \
--output_path "example_input/gen_tests/gen_res.png" \
--height 1024 \
--width 1024
```
### Web Demo
```bash
CUDA_VISIBLE_DEVICES=0 python web_edit.py \
--vlm_path PATH_TO_VLM \
--edit_lora_path PATH_TO_DEIT_LORA \
--server_name "0.0.0.0" \
--server_port 7860
CUDA_VISIBLE_DEVICES=1 python web_generate.py \
--vlm_path PATH_TO_VLM \
--gen_lora_path PATH_TO_GENERATION_LORA \
--server_name "0.0.0.0" \
--server_port 7861
```
## Disclaimer
This project strives to impact the domain of AI-driven image generation positively. Users are granted the freedom to
create images using this tool, but they are expected to comply with local laws and utilize it responsibly.
The developers do not assume any responsibility for potential misuse by users.
## Citation
If you find this project useful for your research, please consider citing our paper:
```bibtex
@misc{xia2025dreamomni2,
title={DreamOmni2: Multimodal Instruction-based Editing and Generation},
author={Bin Xia and Biao Wu and Yuhui Cao and Yangyi Chen and Shengping Zhang and Fangyun Wei and Yanzhe Wang and Zhaorui Zhong and Hanwang Zhang and Yuliang Liu},
year={2025},\
eprint={2510.06679},\
archivePrefix={arXiv},\
primaryClass={cs.CV},\
url={https://arxiv.org/abs/2510.06679},
}
```
# DreamOmni2:多模态指令驱动编辑与生成基准数据集
本仓库包含`DreamOmni2Bench`基准数据集,相关研究成果已发表于论文《DreamOmni2:多模态指令驱动编辑与生成》(DreamOmni2: Multimodal Instruction-based Editing and Generation),链接为[https://huggingface.co/papers/2510.06679](https://huggingface.co/papers/2510.06679)。
DreamOmni2项目提出了两项全新任务:多模态指令驱动编辑与生成。此类任务同时支持文本与图像指令,并将覆盖范围拓展至具体概念与抽象概念领域,极大提升了其实际应用价值。本基准数据集旨在推动能够完成上述新型多模态生成与编辑任务的模型研发。
* **项目主页**:[https://pbihao.github.io/projects/DreamOmni2/index.html](https://pbihao.github.io/projects/DreamOmni2/index.html)
* **代码仓库**:[https://github.com/dvlab-research/DreamOmni2](https://github.com/dvlab-research/DreamOmni2)
<p align="center">
<img width="600" src="https://github.com/dvlab-research/DreamOmni2/blob/main/imgs/gallery.png">
</p>
## 引言
DreamOmni2针对现有指令驱动图像编辑与主体驱动生成方法的局限,提出了两项全新任务:多模态指令驱动编辑与生成。此类任务同时支持文本与图像指令,并将覆盖范围拓展至具体概念与抽象概念领域,极大提升了其实际应用价值。
**(1) 多模态指令驱动生成**
针对传统基于具体物体的主体驱动生成任务,DreamOmni2在开源模型中实现了最优性能,在身份一致性与姿态一致性方面表现优异。此外,DreamOmni2可参考抽象属性(如材质、纹理、妆容、发型、姿态、设计风格、艺术风格等),在该领域的表现甚至超越了商用模型。
**(2) 多模态指令驱动编辑**
相较于传统指令驱动编辑模型,DreamOmni2支持多模态指令编辑。在日常编辑任务中,往往存在难以仅用语言描述、需要参考图像的元素,本模型针对该需求进行了优化,支持参考任意具体物体与抽象属性,性能可与商用模型媲美。
**(3) 统一生成与编辑模型**
基于上述两项全新任务,我们推出了DreamOmni2模型,其可在任意具体或抽象概念的指导下完成多模态指令驱动的编辑与生成任务。总体而言,DreamOmni2是一款更智能、更强大的开源统一生成与编辑模型,在各类任务中均具备更强的性能表现。
## 快速上手(示例用法)
### 依赖安装与环境配置
首先,克隆`DreamOmni2`仓库并安装所需依赖:
bash
git clone https://github.com/dvlab-research/DreamOmni2
cd ./DreamOmni2
pip install -r requirements.txt
接下来,将DreamOmni2权重下载至`models`文件夹:
bash
huggingface-cli download --resume-download --local-dir-use-symlinks False xiabs/DreamOmni2 --local-dir ./models
### 推理
#### 多模态指令驱动编辑
**请注意,对于编辑任务,由于训练数据的格式设置,我们需要将待编辑图像置于输入列表的首位。**
bash
python3 inference_edit.py
--input_img_path "example_input/edit_tests/src.jpg" "example_input/edit_tests/ref.jpg"
--input_instruction "Make the woman from the second image stand on the road in the first image."
--output_path "example_input/edit_tests/edit_res.png"
#### 多模态指令驱动生成
bash
python3 inference_gen.py
--input_img_path "example_input/gen_tests/img1.jpg" "example_input/gen_tests/img2.jpg"
--input_instruction "In the scene, the character from the first image stands on the left, and the character from the second image stands on the right. They are shaking hands against the backdrop of a spaceship interior."
--output_path "example_input/gen_tests/gen_res.png"
--height 1024
--width 1024
### 网页演示
bash
CUDA_VISIBLE_DEVICES=0 python web_edit.py
--vlm_path PATH_TO_VLM
--edit_lora_path PATH_TO_DEIT_LORA
--server_name "0.0.0.0"
--server_port 7860
CUDA_VISIBLE_DEVICES=1 python web_generate.py
--vlm_path PATH_TO_VLM
--gen_lora_path PATH_TO_GENERATION_LORA
--server_name "0.0.0.0"
--server_port 7861
## 免责声明
本项目致力于积极推动AI图像生成领域的发展。用户可通过本工具自由生成图像,但需遵守当地法律法规,并以负责任的方式使用本工具。开发者不对用户可能存在的滥用行为承担任何责任。
## 引用
如果您的研究中使用了本项目,请考虑引用我们的论文:
bibtex
@misc{xia2025dreamomni2,
title={DreamOmni2: Multimodal Instruction-based Editing and Generation},
author={Bin Xia and Biao Wu and Yuhui Cao and Yangyi Chen and Shengping Zhang and Fangyun Wei and Yanzhe Wang and Zhaorui Zhong and Hanwang Zhang and Yuliang Liu},
year={2025},
eprint={2510.06679},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2510.06679},
}
提供机构:
maas
创建时间:
2025-10-14



