VILA-U
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/shawnricecake/fast-car
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是基于LLaMA-2-7B和RQ-VAE开发的AR视频生成模型,名为VILA-U,用于在自动回归视频生成背景下产生视频。这些生成的视频通过使用PSNR、SSIM和LPIPS等评价指标进行评估。量化器的码本大小为16384,且视频是按照VBench中的特定提示生成的。该任务的规模为8帧,每帧分辨率为256×256,专注于视频生成任务。
This dataset supports the autoregressive video generation model VILA-U, which is developed based on LLaMA-2-7B and RQ-VAE for video generation. The generated videos are evaluated using standard objective metrics including PSNR, SSIM, and LPIPS. The codebook size of the quantizer is 16384, and all videos are generated with specific prompts from the VBench benchmark. The task adopts a setup of 8-frame sequences, with each frame at a resolution of 256×256, focusing on video generation tasks.
提供机构:
Open-source



