five

Video Diffusion Models are Training-free Motion Interpreter and Controller

收藏
DataCite Commons2025-10-10 更新2025-04-16 收录
下载链接:
https://researchdata.ntu.edu.sg/citation?persistentId=doi:10.21979/N9/HQM313
下载链接
链接失效反馈
官方服务:
资源简介:
Video generation primarily aims to model authentic and customized motion across frames, making understanding and controlling the motion a crucial topic. Most diffusion-based studies on video motion focus on motion customization with training-based paradigms, which, however, demands substantial training resources and necessitates retraining for diverse models. Crucially, these approaches do not explore how video diffusion models encode cross-frame motion information in their features, lacking interpretability and transparency in their effectiveness. To answer this question, this paper introduces a novel perspective to understand, localize, and manipulate motion-aware features in video diffusion models. Through analysis using Principal Component Analysis (PCA), our work discloses that robust motion-aware feature already exists in video diffusion models. We present a new MOtion FeaTure (MOFT) by eliminating content correlation information and filtering motion channels. MOFT provides a distinct set of benefits, including the ability to encode comprehensive motion information with clear interpretability, extraction without the need for training, and generalizability across diverse architectures. Leveraging MOFT, we propose a novel training-free video motion control framework. Our method demonstrates competitive performance in generating natural and faithful motion, providing architecture-agnostic insights and applicability in a variety of downstream tasks.

视频生成的核心目标是对帧间的真实且定制化的运动进行建模,因此运动的理解与控制成为该领域的关键研究课题。当前绝大多数面向视频运动的基于扩散的研究,均采用基于训练的范式实现运动定制,但此类方法需要消耗大量训练资源,且针对不同模型需重新进行训练。尤为关键的是,此类方法并未探究视频扩散模型如何在其特征中编码帧间运动信息,导致其有效性缺乏可解释性与透明度。为解答这一问题,本文提出了一种全新视角,用于理解、定位与操控视频扩散模型中的运动感知特征。通过主成分分析(Principal Component Analysis,PCA)开展分析,我们的研究揭示出视频扩散模型中本身已具备鲁棒的运动感知特征。我们提出了一种全新的运动特征(MOtion FeaTure,MOFT),通过去除内容相关信息并筛选运动通道来构建。MOFT具备一系列独特优势:可编码全面的运动信息且可解释性清晰,无需训练即可完成提取,且具备跨不同架构的泛化性。基于MOFT,我们提出了一种全新的无需训练的视频运动控制框架。该方法在生成自然且保真的运动方面展现出了具有竞争力的性能,同时提供了与架构无关的研究视角,并可应用于多种下游任务。
提供机构:
DR-NTU (Data)
创建时间:
2024-10-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作