下载链接：

https://modelscope.cn/datasets/ZhipuAI/VisionRewardDB-Image

下载链接

链接失效反馈

官方服务：

资源简介：

# VisionRewardDB-Image ## Introduction VisionRewardDB-Image is a comprehensive dataset designed to train VisionReward-Image models, providing detailed aesthetic annotations across 18 aspects. The dataset aims to enhance the assessment and understanding of visual aesthetics and quality. 🌟✨ For more detail, please refer to the [**Github Repository**](https://github.com/THUDM/VisionReward). 🔍📚 ## Annotation Detail Each image in the dataset is annotated with the following attributes: <table border="1" style="border-collapse: collapse; width: 100%;"> <tr> <th style="padding: 8px; width: 30%;">Dimension</th> <th style="padding: 8px; width: 70%;">Attributes</th> </tr> <tr> <td style="padding: 8px;">Composition</td> <td style="padding: 8px;">Symmetry; Object pairing; Main object; Richness; Background</td> </tr> <tr> <td style="padding: 8px;">Quality</td> <td style="padding: 8px;">Clarity; Color Brightness; Color Aesthetic; Lighting Distinction; Lighting Aesthetic</td> </tr> <tr> <td style="padding: 8px;">Fidelity</td> <td style="padding: 8px;">Detail realism; Detail refinement; Body; Face; Hands</td> </tr> <tr> <td style="padding: 8px;">Safety & Emotion</td> <td style="padding: 8px;">Emotion; Safety</td> </tr> </table> ### Example: Scene Richness (richness) - **2:** Very rich - **1:** Rich - **0:** Normal - **-1:** Monotonous - **-2:** Empty For more detailed annotation guidelines(such as the meanings of different scores and annotation rules), please refer to: - [annotation_deatil](https://flame-spaghetti-eb9.notion.site/VisionReward-Image-Annotation-Detail-196a0162280e80ef8359c38e9e41247e) - [annotation_deatil_zh](https://flame-spaghetti-eb9.notion.site/VisionReward-Image-195a0162280e8044bcb4ec48d000409c) ## Additional Feature Detail The dataset includes two special features: `annotation` and `meta_result`. ### Annotation The `annotation` feature contains scores across 18 different dimensions of image assessment, with each dimension having its own scoring criteria as detailed above. ### Meta Result The `meta_result` feature transforms multi-choice questions into a series of binary judgments. For example, for the `richness` dimension: | Score | Is the image very rich? | Is the image rich? | Is the image not monotonous? | Is the image not empty? | |-------|------------------------|-------------------|---------------------------|----------------------| | 2 | 1 | 1 | 1 | 1 | | 1 | 0 | 1 | 1 | 1 | | 0 | 0 | 0 | 1 | 1 | | -1 | 0 | 0 | 0 | 1 | | -2 | 0 | 0 | 0 | 0 | Each element in the binary array represents a yes/no answer to a specific aspect of the assessment. For detailed questions corresponding to these binary judgments, please refer to the `meta_qa_en.txt` file. ### Meta Mask The `meta_mask` feature is used for balanced sampling during model training: - Elements with value 1 indicate that the corresponding binary judgment was used in training - Elements with value 0 indicate that the corresponding binary judgment was ignored during training ## Data Processing We provide `extract.py` for processing the dataset into JSONL format. The script can optionally extract the balanced positive/negative QA pairs used in VisionReward training by processing `meta_result` and `meta_mask` fields. ```bash python extract.py [--save_imgs] [--process_qa] ``` ## Citation Information ``` @misc{xu2024visionrewardfinegrainedmultidimensionalhuman, title={VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation}, author={Jiazheng Xu and Yu Huang and Jiale Cheng and Yuanming Yang and Jiajun Xu and Yuan Wang and Wenbo Duan and Shen Yang and Qunlin Jin and Shurun Li and Jiayan Teng and Zhuoyi Yang and Wendi Zheng and Xiao Liu and Ming Ding and Xiaohan Zhang and Xiaotao Gu and Shiyu Huang and Minlie Huang and Jie Tang and Yuxiao Dong}, year={2024}, eprint={2412.21059}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2412.21059}, } ```

# VisionRewardDB-Image 数据集 ## 简介 VisionRewardDB-Image 是一款专为训练 VisionReward-Image 模型打造的综合性数据集，涵盖18个维度的精细化美学标注。本数据集旨在提升对视觉美学与图像质量的评估与理解能力。✨🌟 如需了解更多细节，请参阅[**GitHub仓库**](https://github.com/THUDM/VisionReward)。🔍📚 ## 标注详情数据集中的每张图像均附带以下维度的标注： | 维度 | 标注属性 | | ---- | ---- | | 构图 | 对称性、物体配对、主体、丰富度、背景 | | 质量 | 清晰度、色彩亮度、色彩美学、光影层次、光影美学 | | 保真度 | 细节真实感、细节精致度、躯体、面部、手部 | | 安全与情感 | 情感、安全性 | ### 示例：场景丰富度（richness）评分标准如下： - **2分**：极度丰富 - **1分**：丰富 - **0分**：一般 - **-1分**：单调乏味 - **-2分**：空洞无物如需查阅更详细的标注指南（如不同分值的含义与标注规则），请参阅： - [annotation_deatil](https://flame-spaghetti-eb9.notion.site/VisionReward-Image-Annotation-Detail-196a0162280e80ef8359c38e9e41247e) - [annotation_deatil_zh](https://flame-spaghetti-eb9.notion.site/VisionReward-Image-195a0162280e8044bcb4ec48d000409c) ## 额外特征详情本数据集包含两个特殊字段：`annotation` 与 `meta_result`。 ### 标注字段（annotation） `annotation` 字段涵盖图像评估18个维度的评分，各维度均配有前文详述的评分标准。 ### 元结果字段（meta_result） `meta_result` 字段将多项选择题转化为一系列二元判断。以丰富度（richness）维度为例： | 评分 | 图像是否极度丰富？ | 图像是否丰富？ | 图像是否非单调？ | 图像是否非空洞？ | | ---- | ---------------- | ------------- | --------------- | --------------- | | 2 | 1 | 1 | 1 | 1 | | 1 | 0 | 1 | 1 | 1 | | 0 | 0 | 0 | 1 | 1 | | -1 | 0 | 0 | 0 | 1 | | -2 | 0 | 0 | 0 | 0 | 二元数组中的每个元素均代表对评估某一具体方面的是/否回答。如需了解这些二元判断对应的详细问题，请参阅 `meta_qa_en.txt` 文件。 ### 元掩码字段（meta_mask） `meta_mask` 字段用于模型训练过程中的均衡采样： - 取值为1的元素代表对应的二元判断已被纳入训练流程 - 取值为0的元素代表对应的二元判断在训练中被忽略 ## 数据处理我们提供了 `extract.py` 脚本，可将数据集转换为JSONL格式。该脚本支持通过处理 `meta_result` 与 `meta_mask` 字段，提取VisionReward训练中使用的均衡正负样本问答对。 bash python extract.py [--save_imgs] [--process_qa] ## 引用信息 @misc{xu2024visionrewardfinegrainedmultidimensionalhuman, title={VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation}, author={Jiazheng Xu and Yu Huang and Jiale Cheng and Yuanming Yang and Jiajun Xu and Yuan Wang and Wenbo Duan and Shen Yang and Qunlin Jin and Shurun Li and Jiayan Teng and Zhuoyi Yang and Wendi Zheng and Xiao Liu and Ming Ding and Xiaohan Zhang and Xiaotao Gu and Shiyu Huang and Minlie Huang and Jie Tang and Yuxiao Dong}, year={2024}, eprint={2412.21059}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2412.21059}, }

应用场景：