ImageRewardDB

Name: ImageRewardDB
Creator: maas
Published: 2025-12-10 16:16:40
License: 暂无描述

魔搭社区2025-12-10 更新2024-08-31 收录

下载链接：

https://modelscope.cn/datasets/ZhipuAI/ImageRewardDB

下载链接

链接失效反馈

官方服务：

资源简介：

# ImageRewardDB ## Dataset Description - **Homepage: https://huggingface.co/datasets/wuyuchen/ImageRewardDB** - **Repository: https://github.com/THUDM/ImageReward** - **Paper: https://arxiv.org/abs/2304.05977** ### Dataset Summary ImageRewardDB is a comprehensive text-to-image comparison dataset, focusing on text-to-image human preference. It consists of 137k pairs of expert comparisons, based on text prompts and corresponding model outputs from DiffusionDB. To build the ImageRewadDB, we design a pipeline tailored for it, establishing criteria for quantitative assessment and annotator training, optimizing labeling experience, and ensuring quality validation. And ImageRewardDB is now publicly available at [🤗 Hugging Face Dataset](https://huggingface.co/datasets/wuyuchen/ImageRewardDB). Notice: All images in ImageRewardDB are collected from DiffusionDB, and in addition, we gathered together images corresponding to the same prompt. ### Languages The text in the dataset is all in English. ### Four Subsets Considering that the ImageRewardDB contains a large number of images, we provide four subsets in different scales to support different needs. For all subsets, the validation and test splits remain the same. The validation split(1.10GB) contains 412 prompts and 2.6K images(7.32K pairs) and the test(1.16GB) split contains 466 prompts and 2.7K images(7.23K pairs). The information on the train split in different scales is as follows: |Subset|Num of Pairs|Num of Images|Num of Prompts|Size| |:--|--:|--:|--:|--:| |ImageRewardDB 1K|17.6K|6.2K|1K|2.7GB| |ImageRewardDB 2K|35.5K|12.5K|2K|5.5GB| |ImageRewardDB 4K|71.0K|25.1K|4K|10.8GB| |ImageRewardDB 8K|141.1K|49.9K|8K|20.9GB| ## Dataset Structure All the data in this repository is stored in a well-organized way. The 62.6K images in ImageRewardDB are split into several folders, stored in corresponding directories under "./images" according to its split. Each folder contains around 500 prompts, their corresponding images, and a JSON file. The JSON file links the image with its corresponding prompt and annotation. The file structure is as follows: ``` # ImageRewardDB ./ ├── images │ ├── train │ │ ├── train_1 │ │ │ ├── 0a1ed3a5-04f6-4a1b-aee6-d584e7c8ed9c.webp │ │ │ ├── 0a58cfa8-ff61-4d31-9757-27322aec3aaf.webp │ │ │ ├── [...] │ │ │ └── train_1.json │ │ ├── train_2 │ │ ├── train_3 │ │ ├── [...] │ │ └── train_32 │ ├── validation │ │ └── [...] │ └── test │ └── [...] ├── metadata-train.parquet ├── metadata-validation.parquet └── metadata-test.parquet ``` The sub-folders have the name of {split_name}_{part_id}, and the JSON file has the same name as the sub-folder. Each image is a lossless WebP file and has a unique name generated by [UUID](https://en.wikipedia.org/wiki/Universally_unique_identifier). ### Data Instances For instance, below is the image of `1b4b2d61-89c2-4091-a1c0-f547ad5065cb.webp` and its information in train_1.json. ```json { "image_path": "images/train/train_1/0280642d-f69f-41d1-8598-5a44e296aa8b.webp", "prompt_id": "000864-0061", "prompt": "painting of a holy woman, decorated, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha, 8 k ", "classification": "People", "image_amount_in_total": 9, "rank": 5, "overall_rating": 4, "image_text_alignment_rating": 3, "fidelity_rating": 4 } ``` ### Data Fields * image: The image object * prompt_id: The id of the corresponding prompt * prompt: The text of the corresponding prompt * classification: The classification of the corresponding prompt * image_amount_in_total: Total amount of images related to the prompt * rank: The relative rank of the image in all related images * overall_rating: The overall score of this image * image_text_alignment_rating: The score of how well the generated image matches the given text * fidelity_rating: The score of whether the output image is true to the shape and characteristics that the object should have ### Data Splits As we mentioned above, all scales of the subsets we provided have three splits of "train", "validation", and "test". And all the subsets share the same validation and test splits. ### Dataset Metadata We also include three metadata tables `metadata-train.parquet`, `metadata-validation.parquet`, and `metadata-test.parquet` to help you access and comprehend ImageRewardDB without downloading the Zip files. All the tables share the same schema, and each row refers to an image. The schema is shown below, and actually, the JSON files we mentioned above share the same schema: |Column|Type|Description| |:---|:---|:---| |`image_path`|`string`|The relative path of the image in the repository.| |`prompt_id`|`string`|The id of the corresponding prompt.| |`prompt`|`string`|The text of the corresponding prompt.| |`classification`|`string`| The classification of the corresponding prompt.| |`image_amount_in_total`|`int`| Total amount of images related to the prompt.| |`rank`|`int`| The relative rank of the image in all related images.| |`overall_rating`|`int`| The overall score of this image. |`image_text_alignment_rating`|`int`|The score of how well the generated image matches the given text.| |`fidelity_rating`|`int`|The score of whether the output image is true to the shape and characteristics that the object should have.| Below is an example row from metadata-train.parquet. |image_path|prompt_id|prompt|classification|image_amount_in_total|rank|overall_rating|image_text_alignment_rating|fidelity_rating| |:---|:---|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---|:---|:---|:---|:---|:---| |images/train/train_1/1b4b2d61-89c2-4091-a1c0-f547ad5065cb.webp|001324-0093|a magical forest that separates the good world from the dark world, ...|Outdoor Scenes|8|3|6|6|6| ## Loading ImageRewardDB You can use the Hugging Face [Datasets](https://huggingface.co/docs/datasets/quickstart) library to easily load the ImageRewardDB. As we mentioned before, we provide four subsets in the scales of 1k, 2k, 4k, and 8k. You can load them using as following: ```python from datasets import load_dataset # Load the 1K-scale dataset dataset = load_dataset("THUDM/ImageRewardDB", "1k") # Load the 2K-scale dataset dataset = load_dataset("THUDM/ImageRewardDB", "2k") # Load the 4K-scale dataset dataset = load_dataset("THUDM/ImageRewardDB", "4K") # Load the 8K-scale dataset dataset = load_dataset("THUDM/ImageRewardDB", "8k") ``` ## Additional Information ### Licensing Information The ImageRewardDB dataset is available under the [Apache license 2.0](https://www.apache.org/licenses/LICENSE-2.0.html). The Python code in this repository is available under the [MIT License](https://github.com/poloclub/diffusiondb/blob/main/LICENSE). ### Citation Information ``` @misc{xu2023imagereward, title={ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation}, author={Jiazheng Xu and Xiao Liu and Yuchen Wu and Yuxuan Tong and Qinkai Li and Ming Ding and Jie Tang and Yuxiao Dong}, year={2023}, eprint={2304.05977}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```

# ImageRewardDB ## 数据集描述 - **主页：https://huggingface.co/datasets/wuyuchen/ImageRewardDB** - **代码仓库：https://github.com/THUDM/ImageReward** - **论文：https://arxiv.org/abs/2304.05977** ### 数据集概述 ImageRewardDB是一个面向文生图（text-to-image）人类偏好的综合文生图对比数据集。该数据集包含13.7万条专家对比样本，其数据源自DiffusionDB中的文本提示词与对应模型生成图像。为构建ImageRewardDB，我们设计了定制化的数据标注流程，建立了量化评估标准与标注员培训体系，优化了标注体验并确保质量验证。目前ImageRewardDB已在[🤗 Hugging Face数据集平台](https://huggingface.co/datasets/wuyuchen/ImageRewardDB)公开。注意：ImageRewardDB中的所有图像均来自DiffusionDB，此外我们还收集了同一提示词对应的所有图像。 ### 语言说明数据集中的文本均为英文。 ### 四个子集考虑到ImageRewardDB包含大量图像，我们提供了四种不同规模的子集以满足不同需求。所有子集的验证集与测试集划分保持一致。验证集（1.10GB）包含412个提示词与2.6万张图像（共7.32万对对比样本），测试集（1.16GB）包含466个提示词与2.7万张图像（共7.23万对对比样本）。不同规模训练集的信息如下： |子集|对比样本数|图像数|提示词数|大小| |:--|--:|--:|--:|--:| |ImageRewardDB 1K|17.6K|6.2K|1K|2.7GB| |ImageRewardDB 2K|35.5K|12.5K|2K|5.5GB| |ImageRewardDB 4K|71.0K|25.1K|4K|10.8GB| |ImageRewardDB 8K|141.1K|49.9K|8K|20.9GB| ## 数据集结构本仓库中的所有数据均采用结构化存储方式。ImageRewardDB中的6.26万张图像被划分为多个文件夹，按照数据集划分方式存储在`./images`目录下的对应子目录中。每个文件夹包含约500个提示词及其对应的图像，以及一个JSON文件。该JSON文件用于关联图像、对应提示词与标注信息。文件结构如下： # ImageRewardDB ./ ├── images │ ├── train │ │ ├── train_1 │ │ │ ├── 0a1ed3a5-04f6-4a1b-aee6-d584e7c8ed9c.webp │ │ │ ├── 0a58cfa8-ff61-4d31-9757-27322aec3aaf.webp │ │ │ ├── [...] │ │ │ └── train_1.json │ │ ├── train_2 │ │ ├── train_3 │ │ ├── [...] │ │ └── train_32 │ ├── validation │ │ └── [...] │ └── test │ └── [...] ├── metadata-train.parquet ├── metadata-validation.parquet └── metadata-test.parquet 子文件夹命名格式为`{split_name}_{part_id}`，JSON文件与所属子文件夹同名。每张图像均为无损WebP格式文件，文件名采用通用唯一标识符（UUID）生成的唯一字符串。 ### 数据样例例如，以下是`1b4b2d61-89c2-4091-a1c0-f547ad5065cb.webp`图像及其在`train_1.json`中的对应信息。 json { "image_path": "images/train/train_1/0280642d-f69f-41d1-8598-5a44e296aa8b.webp", "prompt_id": "000864-0061", "prompt": "painting of a holy woman, decorated, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha, 8 k ", "classification": "People", "image_amount_in_total": 9, "rank": 5, "overall_rating": 4, "image_text_alignment_rating": 3, "fidelity_rating": 4 } ### 数据字段说明 * `image`：图像对象 * `prompt_id`：对应提示词的唯一标识 * `prompt`：对应提示词的文本内容 * `classification`：对应提示词的分类标签 * `image_amount_in_total`：该提示词对应的总图像数量 * `rank`：该图像在所有相关图像中的相对排名 * `overall_rating`：该图像的整体评分 * `image_text_alignment_rating`：生成图像与给定文本的匹配度评分 * `fidelity_rating`：生成图像对目标物体固有形状与特征的还原度评分 ### 数据划分如前文所述，我们提供的所有规模子集均包含`训练集`、`验证集`与`测试集`三个划分，且所有子集共享同一套验证集与测试集。 ### 数据集元数据我们还提供了三个元数据表`metadata-train.parquet`、`metadata-validation.parquet`与`metadata-test.parquet`，方便您无需下载完整压缩包即可访问与理解ImageRewardDB。所有表的Schema完全一致，每一行对应一张图像。其Schema如下（与前文提及的JSON文件Schema完全一致）： |列名|数据类型|描述| |:---|:---|:---| |`image_path`|`string`|图像在仓库中的相对路径。| |`prompt_id`|`string`|对应提示词的唯一标识。| |`prompt`|`string`|对应提示词的文本内容。| |`classification`|`string`|对应提示词的分类标签。| |`image_amount_in_total`|`int`|该提示词对应的总图像数量。| |`rank`|`int`|该图像在所有相关图像中的相对排名。| |`overall_rating`|`int`|该图像的整体评分。| |`image_text_alignment_rating`|`int`|生成图像与给定文本的匹配度评分。| |`fidelity_rating`|`int`|生成图像对目标物体固有形状与特征的还原度评分。| 以下是`metadata-train.parquet`中的一行样例数据： |image_path|prompt_id|prompt|classification|image_amount_in_total|rank|overall_rating|image_text_alignment_rating|fidelity_rating| |:---|:---|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---|:---|:---|:---|:---|:---| |images/train/train_1/1b4b2d61-89c2-4091-a1c0-f547ad5065cb.webp|001324-0093|a magical forest that separates the good world from the dark world, ...|Outdoor Scenes|8|3|6|6|6| ## 加载ImageRewardDB 您可以使用Hugging Face [Datasets](https://huggingface.co/docs/datasets/quickstart)库轻松加载ImageRewardDB。如前文所述，我们提供了1K、2K、4K与8K四种规模的子集，加载方式如下： python from datasets import load_dataset # 加载1K规模数据集 dataset = load_dataset("THUDM/ImageRewardDB", "1k") # 加载2K规模数据集 dataset = load_dataset("THUDM/ImageRewardDB", "2k") # 加载4K规模数据集 dataset = load_dataset("THUDM/ImageRewardDB", "4K") # 加载8K规模数据集 dataset = load_dataset("THUDM/ImageRewardDB", "8k") ## 附加信息 ### 授权信息 ImageRewardDB数据集采用[Apache许可证2.0](https://www.apache.org/licenses/LICENSE-2.0.html)进行授权。本仓库中的Python代码采用[MIT许可证](https://github.com/poloclub/diffusiondb/blob/main/LICENSE)进行授权。 ### 引用信息 @misc{xu2023imagereward, title={ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation}, author={Jiazheng Xu and Xiao Liu and Yuchen Wu and Yuxuan Tong and Qinkai Li and Ming Ding and Jie Tang and Yuxiao Dong}, year={2023}, eprint={2304.05977}, archivePrefix={arXiv}, primaryClass={cs.CV} }

提供机构：

maas

创建时间：

2024-08-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集