five

VideoGen-RewardBench

收藏
魔搭社区2025-11-27 更新2025-11-15 收录
下载链接:
https://modelscope.cn/datasets/KwaiVGI/VideoGen-RewardBench
下载链接
链接失效反馈
官方服务:
资源简介:
<div align="center"> <p align="center"> 🏆 <a href="https://huggingface.co/spaces/KwaiVGI/VideoGen-RewardBench" target="_blank">[VideoGen-RewardBench Leaderboard]</a> </p> </div> ## Introduction **VideoGen-RewardBench** is a comprehensive benchmark designed to evaluate the performance of video reward models on modern text-to-video (T2V) systems. Derived from the third-party [VideoGen-Eval](https://github.com/AILab-CVC/VideoGen-Eval/tree/main) (Zeng et.al, 2024), we constructing 26.5k (prompt, Video A, Video B) triplets and employing expert annotators to provide pairwise preference labels. These annotations are based on key evaluation dimensions—**Visual Quality (VQ)**, **Motion Quality (MQ)**, **Temporal Alignment (TA)**, and an overall quality score—ensuring a nuanced assessment of each generated video. It covers a diverse range of prompts and videos generated by 12 state-of-the-art T2V models, featuring high resolutions (480×720 to 576×1024) as well as longer durations (4s to 6s). VideoGen-RewardBench offers a robust and fair evaluation framework that accurately reflects human preferences and the latest advancements in video generation. ## Dataset Structure ### Data Instances An example looks as follows: ```json { "path_A": "videos/kling1.5/kling1.5_00103.mp4", "path_B": "videos/minimax/minimax_00103.mp4", "A_model": "kling1.5", "B_model": "minimax", "prompt": "Static camera, a metal ball rolls on a smooth tabletop.", "VQ": "A", "MQ": "A", "TA": "A", "Overall": "A", "fps_A" :30.0, "num_frames_A": 153.0, "fps_B": 25.0, "num_frames_B": 141.0, } ``` ### Data Fields The data fields are: - `path_A`: The file path of Video A in the pair. - `path_B`: The file path of Video B in the pair. - `A_model`: The name of the model that generated Video A. - `B_model`: The name of the model that generated Video B. - `prompt`: The text prompt used to generate both videos. - `VQ`: The video with better visual quality between video A and video B. - `MQ`: The video with better motion quality between video A and video B. - `TA`: The video with better text alignment between video A and video B. - `Overall`: The video with better overall quality between video A and video B. - `fps_A`: The FPS of Video A. - `num_frames_A`: The number of frames in Video A. - `fps_B`: The FPS of Video B. - `num_frames_B`: The number of frames in Video B. ## Citation If you find this project useful, please consider citing: ```bibtex @article{liu2025improving, title={Improving Video Generation with Human Feedback}, author={Jie Liu and Gongye Liu and Jiajun Liang and Ziyang Yuan and Xiaokun Liu and Mingwu Zheng and Xiele Wu and Qiulin Wang and Wenyu Qin and Menghan Xia and Xintao Wang and Xiaohong Liu and Fei Yang and Pengfei Wan and Di Zhang and Kun Gai and Yujiu Yang and Wanli Ouyang}, journal={arXiv preprint arXiv:2501.13918}, year={2025} }

<div align="center"> <p align="center"> 🏆 <a href="https://huggingface.co/spaces/KwaiVGI/VideoGen-RewardBench" target="_blank">[VideoGen-RewardBench 排行榜]</a> </p> </div> ## 简介 **VideoGen-RewardBench**是一款综合性基准测试集,旨在评估现代文本转视频(text-to-video, T2V)系统的视频奖励模型性能。该数据集源自第三方项目[VideoGen-Eval](https://github.com/AILab-CVC/VideoGen-Eval/tree/main)(Zeng等,2024),我们构建了26.5k条(提示词、视频A、视频B)三元组,并邀请专业标注人员生成成对偏好标签。 本次标注依托四大核心评估维度——**视觉质量(Visual Quality, VQ)**、**运动质量(Motion Quality, MQ)**、**时间对齐度(Temporal Alignment, TA)**与整体质量评分,实现对生成视频的精细化评测。数据集涵盖由12个顶尖T2V模型生成的多样化提示词与视频,分辨率覆盖480×720至576×1024,时长区间为4秒至6秒。VideoGen-RewardBench提供了一套稳健且公平的评估框架,能够精准反映人类偏好与当前视频生成领域的最新进展。 ## 数据集结构 ### 数据实例 示例格式如下: json { "path_A": "videos/kling1.5/kling1.5_00103.mp4", "path_B": "videos/minimax/minimax_00103.mp4", "A_model": "kling1.5", "B_model": "minimax", "prompt": "静态镜头下,金属球在光滑桌面上滚动。", "VQ": "A", "MQ": "A", "TA": "A", "Overall": "A", "fps_A": 30.0, "num_frames_A": 153.0, "fps_B": 25.0, "num_frames_B": 141.0 } ### 数据字段 各数据字段含义如下: - `path_A`:该配对中视频A的文件路径 - `path_B`:该配对中视频B的文件路径 - `A_model`:生成视频A的模型名称 - `B_model`:生成视频B的模型名称 - `prompt`:用于生成两个视频的文本提示词 - `VQ`:视频A与视频B中视觉质量更优的一方 - `MQ`:视频A与视频B中运动质量更优的一方 - `TA`:视频A与视频B中与文本提示对齐度更优的一方 - `Overall`:视频A与视频B中整体质量更优的一方 - `fps_A`:视频A的帧率 - `num_frames_A`:视频A的总帧数 - `fps_B`:视频B的帧率 - `num_frames_B`:视频B的总帧数 ## 引用 若您认为本项目对您的研究有所帮助,请引用以下文献: bibtex @article{liu2025improving, title={Improving Video Generation with Human Feedback}, author={Jie Liu and Gongye Liu and Jiajun Liang and Ziyang Yuan and Xiaokun Liu and Mingwu Zheng and Xiele Wu and Qiulin Wang and Wenyu Qin and Menghan Xia and Xintao Wang and Xiaohong Liu and Fei Yang and Pengfei Wan and Di Zhang and Kun Gai and Yujiu Yang and Wanli Ouyang}, journal={arXiv preprint arXiv:2501.13918}, year={2025} }
提供机构:
maas
创建时间:
2025-09-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作