VisionRewardDB-Video

Name: VisionRewardDB-Video
Creator: maas
Published: 2026-01-02 16:23:08
License: 暂无描述

魔搭社区2026-01-02 更新2025-02-15 收录

下载链接：

https://modelscope.cn/datasets/ZhipuAI/VisionRewardDB-Video

下载链接

链接失效反馈

官方服务：

资源简介：

# VisionRewardDB-Video This dataset is a comprehensive collection of video evaluation data designed for multi-dimensional quality assessment of AI-generated videos. It encompasses annotations across 21 diverse aspects, including text-to-video consistency, aesthetic quality, motion dynamics, physical realism, and technical specifications. 🌟✨ [**Github Repository**](https://github.com/THUDM/VisionReward) 🔗 The dataset is structured to facilitate both model training and standardized evaluation: - `Train`: A primary training set with detailed multi-dimensional annotations - `Regression`: A regression set with paired preference data - `Test`: A video preference test set for standardized performance evaluation This holistic approach enables the development and validation of sophisticated video quality assessment models that can evaluate AI-generated videos across multiple critical dimensions, moving beyond simple aesthetic judgments to encompass technical accuracy, semantic consistency, and dynamic performance. ## Annotation Detail Each video in the dataset is annotated with the following attributes: <table border="1" style="border-collapse: collapse; width: 100%;"> <tr> <th style="padding: 8px; width: 30%;">Dimension</th> <th style="padding: 8px; width: 70%;">Attributes</th> </tr> <tr> <td style="padding: 8px;">Alignment</td> <td style="padding: 8px;">Alignment</td> </tr> <tr> <td style="padding: 8px;">Composition</td> <td style="padding: 8px;">Composition</td> </tr> <tr> <td style="padding: 8px;">Quality</td> <td style="padding: 8px;">Color; Lighting Accurate; Lighting Aes; Clear</td> </tr> <tr> <td style="padding: 8px;">Fidelity</td> <td style="padding: 8px;">Detail Refinement; Movement Reality; Letters</td> </tr> <tr> <td style="padding: 8px;">Safety</td> <td style="padding: 8px;">Safety</td> </tr> <tr> <td style="padding: 8px;">Stability</td> <td style="padding: 8px;">Movement Smoothness; Image Quality Stability; Focus; Camera Movement; Camera Stability</td> </tr> <tr> <td style="padding: 8px;">Preservation</td> <td style="padding: 8px;">Shape at Beginning; Shape throughout</td> </tr> <tr> <td style="padding: 8px;">Dynamic</td> <td style="padding: 8px;">Object Motion dynamic; Camera Motion dynamic</td> </tr> <tr> <td style="padding: 8px;">Physics</td> <td style="padding: 8px;">Physics Law</td> </tr> </table> ### Example: Camera Stability - **3:** Very stable - **2:** Slight shake - **1:** Heavy shake - Note: When annotations are missing, the corresponding value will be set to **-1**. For more detailed annotation guidelines(such as the meanings of different scores and annotation rules), please refer to: - [annotation_detail](https://flame-spaghetti-eb9.notion.site/VisioinReward-Video-Annotation-Detail-196a0162280e8077b1acef109b3810ff?pvs=4) - [annotation_detail_zh](https://flame-spaghetti-eb9.notion.site/VisionReward-Video-196a0162280e80e7806af42fc5808c99?pvs=4) ## Additional Feature Detail The dataset includes two special features: `annotation` and `meta_result`. ### Annotation The `annotation` feature contains scores across 21 different dimensions of video assessment, with each dimension having its own scoring criteria as detailed above. ### Meta Result The `meta_result` feature transforms multi-choice questions into a series of binary judgments. For example, for the `Camera Stability` dimension: | Score | Is the camera very stable? | Is the camera not unstable? | |-------|--------------------------|---------------------------| | 3 | 1 | 1 | | 2 | 0 | 1 | | 1 | 0 | 0 | - note: When the corresponding meta_result is -1 (It means missing annotation), the binary judgment should be excluded from consideration Each element in the binary array represents a yes/no answer to a specific aspect of the assessment. For detailed questions corresponding to these binary judgments, please refer to the meta_qa_en.txt file. ### Meta Mask The `meta_mask` feature is used for balanced sampling during model training: - Elements with value 1 indicate that the corresponding binary judgment was used in training - Elements with value 0 indicate that the corresponding binary judgment was ignored during training ## Data Processing ```bash cd videos tar -xvzf train.tar.gz tar -xvzf regression.tar.gz tar -xvzf test.tar.gz ``` We provide `extract.py` for processing the `train` dataset into JSONL format. The script can optionally extract the balanced positive/negative QA pairs used in VisionReward training by processing `meta_result` and `meta_mask` fields. ```bash python extract.py ``` ## Citation Information ``` @misc{xu2024visionrewardfinegrainedmultidimensionalhuman, title={VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation}, author={Jiazheng Xu and Yu Huang and Jiale Cheng and Yuanming Yang and Jiajun Xu and Yuan Wang and Wenbo Duan and Shen Yang and Qunlin Jin and Shurun Li and Jiayan Teng and Zhuoyi Yang and Wendi Zheng and Xiao Liu and Ming Ding and Xiaohan Zhang and Xiaotao Gu and Shiyu Huang and Minlie Huang and Jie Tang and Yuxiao Dong}, year={2024}, eprint={2412.21059}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2412.21059}, } ```

# VisionRewardDB-Video 本数据集为面向AI生成视频多维度质量评估打造的综合性视频评价数据集合，涵盖21个不同维度的标注，包括文生视频（text-to-video）一致性、美学质量、运动动态性、物理真实性与技术规格。🌟✨ [**Github 仓库**](https://github.com/THUDM/VisionReward) 🔗 本数据集的架构设计兼顾模型训练与标准化评估需求： - `Train`：包含详尽多维度标注的核心训练集 - `Regression`：带有配对偏好数据的回归测试集 - `Test`：用于标准化性能评估的视频偏好测试集此整体设计方案支持开发与验证高精度视频质量评估模型，可从多个关键维度对AI生成视频进行评估，突破仅依赖美学判断的局限，覆盖技术准确性、语义一致性与动态表现等维度。 ## 标注详情数据集中的每段视频均包含以下属性的标注： <table border="1" style="border-collapse: collapse; width: 100%;"> <tr> <th style="padding: 8px; width: 30%;">维度</th> <th style="padding: 8px; width: 70%;">属性</th> </tr> <tr> <td style="padding: 8px;">对齐性（Alignment）</td> <td style="padding: 8px;">对齐性</td> </tr> <tr> <td style="padding: 8px;">构图（Composition）</td> <td style="padding: 8px;">构图</td> </tr> <tr> <td style="padding: 8px;">质量（Quality）</td> <td style="padding: 8px;">色彩；光照准确性；美学光照；清晰度</td> </tr> <tr> <td style="padding: 8px;">保真度（Fidelity）</td> <td style="padding: 8px;">细节还原度；运动真实性；文字保真</td> </tr> <tr> <td style="padding: 8px;">安全性（Safety）</td> <td style="padding: 8px;">安全性</td> </tr> <tr> <td style="padding: 8px;">稳定性（Stability）</td> <td style="padding: 8px;">运动流畅度；画面质量稳定性；对焦；相机运动；相机稳定性</td> </tr> <tr> <td style="padding: 8px;">形态留存性（Preservation）</td> <td style="padding: 8px;">初始形态；全程形态</td> </tr> <tr> <td style="padding: 8px;">动态性（Dynamic）</td> <td style="padding: 8px;">物体运动动态；相机运动动态</td> </tr> <tr> <td style="padding: 8px;">物理合规性（Physics）</td> <td style="padding: 8px;">物理定律合规性</td> </tr> </table> ### 示例：相机稳定性 - **3:** 极稳定 - **2:** 轻微抖动 - **1:** 严重抖动 - 注：若标注缺失，对应数值将设为**-1**。如需查看更详细的标注指南（如不同分值的含义与标注规则），请参阅： - [标注详情（英文）](https://flame-spaghetti-eb9.notion.site/VisioinReward-Video-Annotation-Detail-196a0162280e8077b1acef109b3810ff?pvs=4) - [标注详情（中文）](https://flame-spaghetti-eb9.notion.site/VisionReward-Video-196a0162280e80e7806af42fc5808c99?pvs=4) ## 附加特征详情本数据集包含两个特殊特征字段：`annotation`与`meta_result`。 ### 标注字段（annotation） `annotation`字段包含视频评估21个不同维度的评分，各维度均配有前文详述的评分标准。 ### 元结果字段（meta_result） `meta_result`字段将多项选择题转换为一系列二元判断。以`相机稳定性（Camera Stability）`维度为例： | 评分 | 相机是否极稳定？ | 相机是否非不稳定？ | |-------|--------------------------|---------------------------| | 3 | 1 | 1 | | 2 | 0 | 1 | | 1 | 0 | 0 | - 注：若对应`meta_result`值为-1（代表标注缺失），则需忽略该二元判断项。二元数组中的每个元素代表评估某一具体方面的是/否答案。如需了解这些二元判断对应的具体问题，请参阅meta_qa_en.txt文件。 ### 元掩码字段（meta_mask） `meta_mask`字段用于模型训练阶段的均衡采样： - 取值为1的元素代表对应二元判断项将被用于训练 - 取值为0的元素代表对应二元判断项将在训练中被忽略。 ## 数据处理 bash cd videos tar -xvzf train.tar.gz tar -xvzf regression.tar.gz tar -xvzf test.tar.gz 我们提供了`extract.py`脚本，可将`train`数据集处理为JSONL格式。该脚本可通过处理`meta_result`与`meta_mask`字段，按需提取VisionReward训练所用的均衡正负问答样本对。 bash python extract.py ## 引用信息 @misc{xu2024visionrewardfinegrainedmultidimensionalhuman, title={VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation}, author={Jiazheng Xu and Yu Huang and Jiale Cheng and Yuanming Yang and Jiajun Xu and Yuan Wang and Wenbo Duan and Shen Yang and Qunlin Jin and Shurun Li and Jiayan Teng and Zhuoyi Yang and Wendi Zheng and Xiao Liu and Ming Ding and Xiaohan Zhang and Xiaotao Gu and Shiyu Huang and Minlie Huang and Jie Tang and Yuxiao Dong}, year={2024}, eprint={2412.21059}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2412.21059}, }

提供机构：

maas

创建时间：

2025-02-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集