Omni-VFX
收藏魔搭社区2025-12-04 更新2025-08-16 收录
下载链接:
https://modelscope.cn/datasets/GD-ML/Omni-VFX
下载链接
链接失效反馈官方服务:
资源简介:
# *Omni-Effects*: Unified and Spatially-Controllable Visual Effects Generation
[](https://arxiv.org/abs/2508.07981)
[](https://amap-ml.github.io/Omni-Effects.github.io/)
[](https://github.com/AMAP-ML/Omni-Effects)
[](https://huggingface.co/datasets/GD-ML/Omni-VFX)
[](https://huggingface.co/GD-ML/Omni-Effects)
# 🔥 Updates
- [2025/08] We release the CogVideoX-1.5 finetuned on our Omni-VFX dataset !
- [2025/08] We release the controllable single-VFX/Multi-VFX version of Omni-Effects!
# 📣 Overview
<p align="center">
<img src="teaser.jpg" width="100%"/>
</p>
Visual effects (VFX) are essential visual enhancements fundamental to modern cinematic production. Although video generation models offer cost-efficient solutions for VFX production, current methods are constrained by per-effect LoRA training, which limits generation to single effects. This fundamental limitation impedes applications that require spatially controllable composite effects, i.e., the concurrent generation of multiple effects at designated locations. However, integrating diverse effects into a unified framework faces major challenges: interference from effect variations and spatial uncontrollability during multi-VFX joint training. To tackle these challenges, we propose *Omni-Effects*, a first unified framework capable of generating prompt-guided effects and spatially controllable composite effects. The core of our framework comprises two key innovations: (1) **LoRA-based Mixture of Experts (LoRA-MoE)**, which employs a group of expert LoRAs, integrating diverse effects within a unified model while effectively mitigating cross-task interference. (2) **Spatial-Aware Prompt (SAP)** incorporates spatial mask information into the text token, enabling precise spatial control. Furthermore, we introduce an Independent-Information Flow (IIF) module integrated within the SAP, isolating the control signals corresponding to individual effects to prevent any unwanted blending. To facilitate this research, we construct a comprehensive VFX dataset *Omni-VFX* via a novel data collection pipeline combining image editing and First-Last Frame-to-Video (FLF2V) synthesis, and introduce a dedicated VFX evaluation framework for validating model performance. Extensive experiments demonstrate that *Omni-Effects* achieves precise spatial control and diverse effect generation, enabling users to specify both the category and location of desired effects.
# 🔨 Installation
```shell
git clone https://github.com/AMAP-ML/Omni-Effects.git
cd Omni-Effects
conda create -n OmniEffects python=3.10.14
pip install -r requirements.txt
```
Download checkpoints from HuggingFace and put it in `checkpoints`.
# 🔧 Usage
## Omni-VFX dataset and prompt-guided VFX
We have released the most comprehensive VFX dataset currently available on HuggingFace. The dataset primarily consists of three sources: assets from [Open-VFX dataset](https://huggingface.co/datasets/sophiaa/Open-VFX), distillations of VFX provided by [Remade-AI](https://huggingface.co/Remade-AI), and VFX videos created using FLF2V. Due to copyright restrictions, a small portion of the videos cannot be publicly shared. Additionally, we provide the CogVideoX1.5 model, fine-tuned on our Omni-VFX dataset. This model enables prompt-guided VFX video generation. The prompts are refered to `VFX-prompts.txt`.
```shell
sh scripts/prompt_guided_VFX.sh # modify the prompt and input image
```
## SPA-guided spatially controllable VFX
Current SPA-guided spatially controllable VFX supports controllable **"Melt it", "Levitate it", "Explode it", "Turn it into anime style" and "Change the setting to a winter scene"**.
### Single-VFX
```shell
sh scripts/inference_omnieffects_singleVFX.sh
```
### Multi-VFX
```shell
sh scripts/inference_omnieffects_multiVFX.sh
```
# 📊 Quantitative Results
*Omni-Effects* achieves precise spatial control in visual effects generation.
<p align="center">
<img src="quantitative.png" width="100%"/>
</p>
# Acknowledgement
We would like to thank the authors of [CogVideoX](https://github.com/zai-org/CogVideo), [EasyControl](https://github.com/Xiaojiu-z/EasyControl) and [VFXCreator](https://huggingface.co/datasets/sophiaa/Open-VFX) for their outstanding work.
# Citation
```
@misc{mao2025omnieffects,
title={Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation},
author={Fangyuan Mao and Aiming Hao and Jintao Chen and Dongxia Liu and Xiaokun Feng and Jiashu Zhu and Meiqi Wu and Chubin Chen and Jiahong Wu and Xiangxiang Chu},
year={2025},
eprint={2508.07981},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
# *Omni-Effects*:统一且空间可控的视觉特效生成
[](https://arxiv.org/abs/2508.07981)
[](https://amap-ml.github.io/Omni-Effects.github.io/)
[](https://github.com/AMAP-ML/Omni-Effects)
[](https://huggingface.co/datasets/GD-ML/Omni-VFX)
[](https://huggingface.co/GD-ML/Omni-Effects)
# 🔥 更新
- [2025/08] 我们发布了基于Omni-VFX数据集微调的CogVideoX-1.5模型!
- [2025/08] 我们发布了Omni-Effects的可控单特效/多特效版本!
# 📣 概述
<p align="center">
<img src="teaser.jpg" width="100%"/>
</p>
视觉特效(Visual Effects,VFX)是现代影视制作中不可或缺的核心视觉增强手段。尽管视频生成模型为视觉特效制作提供了高性价比的解决方案,但当前方法受限于单特效低秩自适应(LoRA)训练,仅能生成单一特效。这一根本性局限阻碍了对空间可控复合特效的应用需求,即无法在指定位置同时生成多种特效。然而,将多种特效整合至统一框架中面临两大核心挑战:多特效联合训练时的特效差异干扰与空间不可控性问题。为解决这些挑战,我们提出*Omni-Effects*——首个可实现提示引导式特效生成与空间可控复合特效生成的统一框架。我们的框架核心包含两项关键创新:(1) **基于LoRA的专家混合模型(LoRA-MoE)**:通过一组专家LoRA,将多种特效整合至统一模型中,同时有效缓解跨任务干扰。(2) **空间感知提示(Spatial-Aware Prompt,SAP)**:将空间掩码信息融入文本Token(Token),实现精准的空间控制。此外,我们还在SAP中集成了独立信息流(Independent-Information Flow,IIF)模块,将对应单个特效的控制信号进行隔离,以避免不必要的信号混合。为推动该领域研究,我们通过结合图像编辑与首尾帧转视频(First-Last Frame-to-Video,FLF2V)合成的新型数据采集流程,构建了大规模综合视觉特效数据集*Omni-VFX*,并提出了专用的VFX评估框架以验证模型性能。大量实验结果表明,*Omni-Effects*可实现精准的空间控制与多样化特效生成,允许用户指定所需特效的类别与位置。
# 🔨 安装
shell
git clone https://github.com/AMAP-ML/Omni-Effects.git
cd Omni-Effects
conda create -n OmniEffects python=3.10.14
pip install -r requirements.txt
从HuggingFace下载模型checkpoint并放置于`checkpoints`目录中。
# 🔧 使用方法
## Omni-VFX数据集与提示引导式VFX
我们已在HuggingFace平台发布了目前规模最庞大的VFX数据集。该数据集主要包含三大来源:[Open-VFX数据集](https://huggingface.co/datasets/sophiaa/Open-VFX)的素材、[Remade-AI](https://huggingface.co/Remade-AI)提供的VFX特效蒸馏数据,以及通过FLF2V生成的VFX视频。受版权限制,少量视频无法公开分享。此外,我们还提供了基于Omni-VFX数据集微调的CogVideoX1.5模型,该模型可实现提示引导式VFX视频生成,提示词可参考`VFX-prompts.txt`文件。
shell
sh scripts/prompt_guided_VFX.sh # 修改提示词与输入图像后运行
## 基于SPA的空间可控VFX
当前基于SPA的空间可控VFX支持以下可控特效类型:**"融化物体""悬浮物体""爆炸特效""转换为动漫风格"以及"将场景转换为冬日雪景"**。
### 单特效模式
shell
sh scripts/inference_omnieffects_singleVFX.sh
### 多特效模式
shell
sh scripts/inference_omnieffects_multiVFX.sh
# 📊 定量实验结果
*Omni-Effects*在视觉特效生成中可实现精准的空间控制。
<p align="center">
<img src="quantitative.png" width="100%"/>
</p>
# 🎓 致谢
我们谨向[CogVideoX](https://github.com/zai-org/CogVideo)、[EasyControl](https://github.com/Xiaojiu-z/EasyControl)以及[VFXCreator](https://huggingface.co/datasets/sophiaa/Open-VFX)的作者们致以诚挚谢意,感谢他们的杰出工作。
# 📝 引用格式
bibtex
@misc{mao2025omnieffects,
title={Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation},
author={Fangyuan Mao and Aiming Hao and Jintao Chen and Dongxia Liu and Xiaokun Feng and Jiashu Zhu and Meiqi Wu and Chubin Chen and Jiahong Wu and Xiangxiang Chu},
year={2025},
eprint={2508.07981},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
提供机构:
maas
创建时间:
2025-08-12



