FlintstonesHD

Name: FlintstonesHD
Creator: OpenDataLab
Published: 2026-05-24 12:30:44
License: 暂无描述

OpenDataLab2026-05-24 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/FlintstonesHD

下载链接

链接失效反馈

官方服务：

资源简介：

我们提出了nuwa-xl，这是一种新颖的扩散结构，用于生成极长的视频。目前大多数工作都是逐段顺序生成长视频，这通常会导致短视频训练和推断长视频之间的差距，并且顺序生成效率低下。相反，我们的方法采用 “从粗到细” 的过程，其中可以以相同的粒度并行生成视频。应用全局扩散模型以在整个时间范围内生成关键帧，然后局部扩散模型递归地填充附近帧之间的内容。这种简单而有效的策略允许我们直接对长视频 (3376帧) 进行训练，以减少训练-推理差距，并可以并行生成所有片段。为了评估我们的模型，我们构建了FlintstonesHD数据集，这是长视频生成的新基准。实验表明，我们的模型不仅生成具有全局和局部相干性的高质量长视频，而且在生成1024帧时，在相同的硬件设置下，平均推理时间从7.55分钟减少到26秒 (94.26%)。

We propose Nuwa-XL, a novel diffusion architecture for generating extremely long videos. Most existing works generate long videos sequentially on a segment-by-segment basis, which usually results in a gap between training on short videos and inferring on long ones, and suffers from low sequential generation efficiency. In contrast, our method adopts a "coarse-to-fine" pipeline, where videos can be generated in parallel with the same granularity. Specifically, a global diffusion model is applied to generate key frames across the entire temporal range, followed by a local diffusion model that recursively fills the content between adjacent frames. This simple yet effective strategy allows us to directly train on long videos (up to 3376 frames) to narrow the training-inference gap, and enables parallel generation of all segments. To evaluate our model, we construct the FlintstonesHD dataset, a new benchmark for long video generation. Experiments show that our model not only generates high-quality long videos with both global and local coherence, but also reduces the average inference time from 7.55 minutes to 26 seconds (a 94.26% reduction) when generating 1024-frame videos under the same hardware configuration.

提供机构：

OpenDataLab

创建时间：

2023-10-11

搜集汇总

数据集介绍