five

GenAI-Bench-1600

收藏
魔搭社区2026-05-01 更新2025-04-12 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/GenAI-Bench-1600
下载链接
链接失效反馈
官方服务:
资源简介:
# ***GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation*** --- <div align="center"> Baiqi Li<sup>1*</sup>, Zhiqiu Lin<sup>1,2*</sup>, Deepak Pathak<sup>1</sup>, Jiayao Li<sup>1</sup>, Yixin Fei<sup>1</sup>, Kewen Wu<sup>1</sup>, Tiffany Ling<sup>1</sup>, Xide Xia<sup>2†</sup>, Pengchuan Zhang<sup>2†</sup>, Graham Neubig<sup>1†</sup>, and Deva Ramanan<sup>1†</sup>. </div> <div align="center" style="font-weight:bold;"> <sup>1</sup>Carnegie Mellon University, <sup>2</sup>Meta </div> <!-- ![](https://huggingface.co/datasets/BaiqiL/GenAI-Bench/resolve/main/vqascore_leaderboard.jpg) --> ## Links: <div align="center"> [**📖Paper**](https://arxiv.org/pdf/2406.13743) | | [🏠**Home Page**](https://linzhiqiu.github.io/papers/genai_bench) | | [🔍**GenAI-Bench Dataset Viewer**](https://huggingface.co/spaces/BaiqiL/GenAI-Bench-DataViewer) | [**🏆Leaderboard**](#Leaderboard) | </div> <div align="center"> [🗂️GenAI-Bench-1600(ZIP format)](https://huggingface.co/datasets/BaiqiL/GenAI-Bench-1600) | | [🗂️GenAI-Bench-Video(ZIP format)](https://huggingface.co/datasets/zhiqiulin/GenAI-Bench-800) | | [🗂️GenAI-Bench-Ranking(ZIP format)](https://huggingface.co/datasets/zhiqiulin/GenAI-Image-Ranking-800) </div> ## 🚩 **News** - ✅ Aug. 18, 2024. 💥 GenAI-Bench-1600 is used by 🧨 [**Imagen 3**](https://arxiv.org/abs/2408.07009) ! - ✅ Jun. 19, 2024. 💥 Our [paper](https://openreview.net/pdf?id=hJm7qnW3ym) won the **Best Paper** award at the **CVPR SynData4CV workshop** ! ## Usage ```python # load the GenAI-Bench(GenAI-Bench-1600) benchmark from datasets import load_dataset dataset = load_dataset("BaiqiL/GenAI-Bench") ``` ## Citation Information ``` @article{li2024genai, title={Genai-bench: Evaluating and improving compositional text-to-visual generation}, author={Li, Baiqi and Lin, Zhiqiu and Pathak, Deepak and Li, Jiayao and Fei, Yixin and Wu, Kewen and Ling, Tiffany and Xia, Xide and Zhang, Pengchuan and Neubig, Graham and others}, journal={arXiv preprint arXiv:2406.13743}, year={2024} } ``` ![](https://huggingface.co/datasets/BaiqiL/GenAI-Bench-pictures/resolve/main/GenAI-Bench.jpg) ![](https://huggingface.co/datasets/BaiqiL/GenAI-Bench-pictures/resolve/main/genaibench_examples.jpg) ## Description: Our dataset consists of three parts: **GenAI-Bench (Gen-Bench-1600)**, **GenAI-Bench-Video**, and **GenAI-Bench-Ranking**, with Gen-Bench-1600 being the primary dataset. For detailed processing methods of the above datasets of zip format, please refer to `dataset.py` in [code](https://github.com/linzhiqiu/t2v_metrics). [**GenAI-Bench benchmark (GenAI-Bench-1600)**](https://huggingface.co/datasets/BaiqiL/GenAI-Bench-1600) consists of 1,600 challenging real-world text prompts sourced from professional designers. Compared to benchmarks such as PartiPrompt and T2I-CompBench, GenAI-Bench captures a wider range of aspects in the compositional text-to-visual generation, ranging from _basic_ (scene, attribute, relation) to _advanced_ (counting, comparison, differentiation, logic). GenAI-Bench benchmark also collects human alignment ratings (1-to-5 Likert scales) on images and videos generated by ten leading models, such as Stable Diffusion, DALL-E 3, Midjourney v6, Pika v1, and Gen2. GenAI-Bench: - Prompt: 1600 prompts sourced from professional designers. - Compositional Skill Tags: Multiple compositional tags for each prompt. The compositional skill tags are categorized into **_Basic Skill_** and **_Advanced Skill_**. For detailed definitions and examples, please refer to [our paper](). - Images: Generated images are collected from DALLE_3, DeepFloyd_I_XL_v1, Midjourney_6, SDXL_2_1, SDXL_Base and SDXL_Turbo. - Human Ratings: 1-to-5 Likert scale ratings for each image. **(Other Datasets: [GenAI-Bench-Video](https://huggingface.co/datasets/zhiqiulin/GenAI-Bench-800) | [GenAI-Bench-Ranking](https://huggingface.co/datasets/zhiqiulin/GenAI-Image-Ranking-800))** ### Languages English ### Supported Tasks Text-to-Visual Generation; Evaluation for Automated Evaluation Metrics. ### Comparing GenAI-Bench to Existing Text-to-Visual Benchmarks ![](https://huggingface.co/datasets/BaiqiL/GenAI-Bench-pictures/resolve/main/Comparison.png) ## Dataset Structure ### Data Instances ``` Dataset({ features: ['Index', 'Prompt', 'Tags', 'HumanRatings', 'DALLE_3', 'DeepFloyd_I_XL_v1', 'Midjourney_6', 'SDXL_2_1', 'SDXL_Base', 'SDXL_Turbo'], num_rows: 1600 }) ``` ### Data Fields Name | Explanation --- | --- `Index` | **Description:** the unique ID of an example. **Data type:** string `Prompt` | **Description:** prompt. **Data type:** string `Tags` | **Description:** basic skills in the prompt. **Data type:** dict &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;`basic_skills` | **Description:** basic skills in the prompt. **Data type:** list &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;`advanced_skills` | **Description:** advanced skills in the prompt. **Data type:** list `DALLE_3` | **Description:** generated image from DALLE3. **Data type:** PIL.JpegImagePlugin.JpegImageFile `Midjourney_6` | **Description:** generated image from Midjourney_6. **Data type:** PIL.JpegImagePlugin.JpegImageFile `DeepFloyd_I_XL_v1` | **Description:** generated image from DeepFloyd_I_XL_v1. **Data type:** PIL.JpegImagePlugin.JpegImageFile `SDXL_2_1` | **Description:** generated image from SDXL_2_1. **Data type:** PIL.JpegImagePlugin.JpegImageFile `SDXL_Base` | **Description:** generated image from SDXL_Base. **Data type:** PIL.JpegImagePlugin.JpegImageFile `SDXL_Turbo` | **Description:** generated image from SDXL_Turbo. **Data type:** PIL.JpegImagePlugin.JpegImageFile `HumanRatings` | **Description:** human ratings for matching between prrompt and image. **Data type:** dict &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;`DALLE_3` | **Description:** human ratings for matching between prrompt and image. **Data type:** list &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;`SDXL_Turbo` | **Description:** human ratings for matching between prrompt and image. **Data type:** list &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;`Midjourney_6` | **Description:** human ratings for matching between prrompt and image. **Data type:** list &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;`DeepFloyd_I_XL_v1` | **Description:** human ratings for matching between prrompt and image. **Data type:** list &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;`SDXL_2_1` | **Description:** human ratings for matching between prrompt and image. **Data type:** list &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;`SDXL_Base` | **Description:** human ratings for matching between prrompt and image. **Data type:** list ### Statistics Dataset | Number of Prompts | Number of Skill Tags | Number of Images | Number of Videos| Number of Human Ratings| ---| ---: | ---: | ---: | ---: | ---: GenAI-Bench| 1600 | 5,000+ | 9,600 | -- |28,800 GenAI-Bench-Video| 800 | 2,500+ | -- | 3,200 |9,600 GenAI-Ranking| 800 | 2,500+ | 14,400 | -- |43,200 (each prompt-image/video pair has three human ratings.) ## Data Source ### Prompts All prompts are sourced from professional designers who use tools such as Midjourney and CIVITAI. ### Multiple Compositional Tags for Prompts All tags on each prompt are verified by human annotators. ### Generated Images Generating images using all 1,600 GenAI-Bench prompts from DALLE_3, DeepFloyd_I_XL_v1, Midjourney_6, SDXL_2_1, SDXL_Base and SDXL_Turbo. ### Generated Videos Generated Videos using all 800 GenAI-Bench prompts from Pika, Gen2, ModelScope and Floor33. ### Human Ratings We hired three trained human annotators to individually rate each generated image/video. We pay the local minimum wage of 12 dollars per hour for a total of about 800 annotator hours. ## Dataset Construction ### Overall Process ![image/png](https://huggingface.co/datasets/BaiqiL/GenAI-Bench-pictures/resolve/main/Dataset%20Construction.jpg) - **Prompt Collecting:** we source prompts from professional designers who use tools such as Midjourney and CIVITAI. This ensures the prompts encompass practical skills relevant to real-world applications and are free of subjective or inappropriate content. - **Compositional Skills Tagging:** each GenAI-Bench prompt is carefully tagged with all its evaluated skills. We then generate images and videos using state-of-the-art models like SD-XL and Gen2. We follow the recommended annotation protocol to collect 1-to-5 Likert scale ratings for how well the generated visuals align with the input text prompts. - **Image/Video Collecting and Human Rating:** we then generate images and videos using state-of-the-art models like SD-XL and Gen2. We follow the recommended annotation protocol to collect 1-to-5 Likert scale ratings for how well the generated visuals align with the input text prompts. # Leaderboard <img src="https://huggingface.co/datasets/BaiqiL/GenAI-Bench-pictures/resolve/main/vqascore_leaderboard.jpg" alt="leaderboard" width="500"/> ## Licensing Information apache-2.0 ## Maintenance We will continuously update the GenAI-Bench benchmark. If you have any questions about the dataset or notice any issues, please feel free to contact [Baiqi Li](mailto:libaiqi123@gmail.com) or [Zhiqiu Lin](mailto:zhiqiul@andrew.cmu.edu). Our team is committed to maintaining this dataset in the long run to ensure its quality!

# ***GenAI-Bench:评估与改进组合式文本到视觉生成*** --- <div align="center"> Baiqi Li<sup>1*</sup>, Zhiqiu Lin<sup>1,2*</sup>, Deepak Pathak<sup>1</sup>, Jiayao Li<sup>1</sup>, Yixin Fei<sup>1</sup>, Kewen Wu<sup>1</sup>, Tiffany Ling<sup>1</sup>, Xide Xia<sup>2†</sup>, Pengchuan Zhang<sup>2†</sup>, Graham Neubig<sup>1†</sup>, 和 Deva Ramanan<sup>1†</sup>. </div> <div align="center" style="font-weight:bold;"> <sup>1</sup>卡内基梅隆大学,<sup>2</sup>Meta </div> <!-- ![](https://huggingface.co/datasets/BaiqiL/GenAI-Bench/resolve/main/vqascore_leaderboard.jpg) --> ## 链接: <div align="center"> [**📖论文**](https://arxiv.org/pdf/2406.13743) | | [🏠**项目主页**](https://linzhiqiu.github.io/papers/genai_bench) | | [🔍**GenAI-Bench数据集查看器**](https://huggingface.co/spaces/BaiqiL/GenAI-Bench-DataViewer) | [**🏆排行榜**](#排行榜) | </div> <div align="center"> [🗂️GenAI-Bench-1600(ZIP格式)](https://huggingface.co/datasets/BaiqiL/GenAI-Bench-1600) | | [🗂️GenAI-Bench-Video(ZIP格式)](https://huggingface.co/datasets/zhiqiulin/GenAI-Bench-800) | | [🗂️GenAI-Bench-Ranking(ZIP格式)](https://huggingface.co/datasets/zhiqiulin/GenAI-Image-Ranking-800) </div> ## 🚩 **新闻** - ✅ 2024年8月18日。 💥 GenAI-Bench-1600被🧨 [**Imagen 3**](https://arxiv.org/abs/2408.07009) 采用! - ✅ 2024年6月19日。 💥 我们的[论文](https://openreview.net/pdf?id=hJm7qnW3ym)在**CVPR SynData4CV研讨会**上斩获**最佳论文奖**! ## 使用方法 python # 加载GenAI-Bench(GenAI-Bench-1600)基准数据集 from datasets import load_dataset dataset = load_dataset("BaiqiL/GenAI-Bench") ## 引用信息 {li2024genai, title={GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation}, author={Li, Baiqi and Lin, Zhiqiu and Pathak, Deepak and Li, Jiayao and Fei, Yixin and Wu, Kewen and Ling, Tiffany and Xia, Xide and Zhang, Pengchuan and Neubig, Graham and others}, journal={arXiv preprint arXiv:2406.13743}, year={2024} } ![](https://huggingface.co/datasets/BaiqiL/GenAI-Bench-pictures/resolve/main/GenAI-Bench.jpg) ![](https://huggingface.co/datasets/BaiqiL/GenAI-Bench-pictures/resolve/main/genaibench_examples.jpg) ## 数据集描述: 我们的数据集包含三个子模块:**GenAI-Bench(Gen-Bench-1600)**、**GenAI-Bench-Video**以及**GenAI-Bench-Ranking**,其中Gen-Bench-1600为核心基准数据集。如需了解上述ZIP格式数据集的详细处理方法,请参阅[代码仓库](https://github.com/linzhiqiu/t2v_metrics)中的`dataset.py`文件。 [**GenAI-Bench基准数据集(GenAI-Bench-1600)**](https://huggingface.co/datasets/BaiqiL/GenAI-Bench-1600)包含1600条来自专业设计师的高难度真实世界文本提示词。与PartiPrompt、T2I-CompBench等现有基准相比,GenAI-Bench覆盖了组合式文本到视觉生成任务中更广泛的能力维度,从**基础能力**(场景、属性、关系)到**高级能力**(计数、对比、区分、逻辑推理)均有涉及。该基准还收集了10个主流模型生成的图像与视频的人类对齐评分(1至5分李克特量表),涉及模型包括Stable Diffusion、DALL-E 3、Midjourney v6、Pika v1以及Gen2。 GenAI-Bench: - 提示词:1600条来自专业设计师的提示词。 - 组合式能力标签:每条提示词对应多个组合式标签。这些标签被划分为**_基础能力标签_**与**_高级能力标签_**,详细定义与示例请参阅[我们的论文]()。 - 生成图像:收集了来自DALLE_3、DeepFloyd_I_XL_v1、Midjourney_6、SDXL_2_1、SDXL_Base以及SDXL_Turbo的生成图像。 - 人类评分:每条生成图像对应的1至5分李克特量表评分。 **(其他数据集:[GenAI-Bench-Video](https://huggingface.co/datasets/zhiqiulin/GenAI-Bench-800) | [GenAI-Bench-Ranking](https://huggingface.co/datasets/zhiqiulin/GenAI-Image-Ranking-800))** ### 语言 英语 ### 支持任务 文本到视觉生成;自动化评估指标的评估。 ### GenAI-Bench与现有文本到视觉基准的对比 ![](https://huggingface.co/datasets/BaiqiL/GenAI-Bench-pictures/resolve/main/Comparison.png) ## 数据集结构 ### 数据实例 Dataset({ features: ['索引', '提示词', '标签', '人类评分', 'DALLE_3', 'DeepFloyd_I_XL_v1', 'Midjourney_6', 'SDXL_2_1', 'SDXL_Base', 'SDXL_Turbo'], num_rows: 1600 }) ### 数据字段 名称 | 说明 --- | --- `索引` | **说明:** 样本的唯一ID。 **数据类型:** 字符串 `提示词` | **说明:** 文本提示词。 **数据类型:** 字符串 `标签` | **说明:** 提示词中的基础能力标签。 **数据类型:** 字典 &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;`基础能力标签` | **说明:** 提示词中的基础能力标签。 **数据类型:** 列表 &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;`高级能力标签` | **说明:** 提示词中的高级能力标签。 **数据类型:** 列表 `DALLE_3` | **说明:** DALLE3生成的图像。 **数据类型:** PIL.JpegImagePlugin.JpegImageFile `Midjourney_6` | **说明:** Midjourney_6生成的图像。 **数据类型:** PIL.JpegImagePlugin.JpegImageFile `DeepFloyd_I_XL_v1` | **说明:** DeepFloyd_I_XL_v1生成的图像。 **数据类型:** PIL.JpegImagePlugin.JpegImageFile `SDXL_2_1` | **说明:** SDXL_2_1生成的图像。 **数据类型:** PIL.JpegImagePlugin.JpegImageFile `SDXL_Base` | **说明:** SDXL_Base生成的图像。 **数据类型:** PIL.JpegImagePlugin.JpegImageFile `SDXL_Turbo` | **说明:** SDXL_Turbo生成的图像。 **数据类型:** PIL.JpegImagePlugin.JpegImageFile `人类评分` | **说明:** 提示词与图像匹配度的人类评分。 **数据类型:** 字典 &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;`DALLE_3` | **说明:** 提示词与图像匹配度的人类评分。 **数据类型:** 列表 &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;`SDXL_Turbo` | **说明:** 提示词与图像匹配度的人类评分。 **数据类型:** 列表 &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;`Midjourney_6` | **说明:** 提示词与图像匹配度的人类评分。 **数据类型:** 列表 &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;`DeepFloyd_I_XL_v1` | **说明:** 提示词与图像匹配度的人类评分。 **数据类型:** 列表 &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;`SDXL_2_1` | **说明:** 提示词与图像匹配度的人类评分。 **数据类型:** 列表 &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;`SDXL_Base` | **说明:** 提示词与图像匹配度的人类评分。 **数据类型:** 列表 ### 统计信息 | 数据集 | 提示词数量 | 标签数量 | 图像数量 | 视频数量 | 人类评分数量 | |---| ---: | ---: | ---: | ---: | ---: | GenAI-Bench| 1600 | 5,000+ | 9,600 | -- |28,800 | GenAI-Bench-Video| 800 | 2,500+ | -- | 3,200 |9,600 | GenAI-Ranking| 800 | 2,500+ | 14,400 | -- |43,200 (每条提示词-图像/视频对应三个人类评分。) ## 数据来源 ### 提示词来源 所有提示词均来自使用Midjourney、CIVITAI等工具的专业设计师。 ### 提示词的多组合式能力标签 每条提示词对应的所有标签均经过人类标注员验证。 ### 生成图像 使用全部1600条GenAI-Bench提示词,从DALLE_3、DeepFloyd_I_XL_v1、Midjourney_6、SDXL_2_1、SDXL_Base以及SDXL_Turbo生成图像。 ### 生成视频 使用全部800条GenAI-Bench提示词,从Pika、Gen2、ModelScope以及Floor33生成视频。 ### 人类评分 我们聘请了三名经过培训的人类标注员,对每条生成图像/视频进行独立评分。我们按照当地最低工资标准(每小时12美元)支付报酬,总标注时长约为800小时。 ## 数据集构建 ### 整体流程 ![image/png](https://huggingface.co/datasets/BaiqiL/GenAI-Bench-pictures/resolve/main/Dataset%20Construction.jpg) - **提示词收集**:我们从使用Midjourney、CIVITAI等工具的专业设计师处获取提示词,确保提示词覆盖实际应用中的实用能力,且不含主观或不当内容。 - **组合式能力标签标注**:每条GenAI-Bench提示词均被仔细标注所有待评估的能力标签。我们随后使用SD-XL、Gen2等主流模型生成图像与视频,并遵循标准标注协议,收集生成视觉内容与输入文本提示词的对齐程度的1至5分李克特量表评分。 - **图像/视频收集与人类评分**:我们使用SD-XL、Gen2等主流模型生成图像与视频,并遵循标准标注协议,收集生成视觉内容与输入文本提示词的对齐程度的1至5分李克特量表评分。 # 排行榜 <img src="https://huggingface.co/datasets/BaiqiL/GenAI-Bench-pictures/resolve/main/vqascore_leaderboard.jpg" alt="leaderboard" width="500"/> ## 开源许可 Apache-2.0 ## 维护 我们将持续更新GenAI-Bench基准数据集。若您对该数据集有任何疑问或发现问题,请联系[Baiqi Li](mailto:libaiqi123@gmail.com)或[Zhiqiu Lin](mailto:zhiqiul@andrew.cmu.edu)。我们团队将长期维护该数据集,确保其质量!
提供机构:
maas
创建时间:
2025-04-11
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
GenAI-Bench-1600是一个包含1600个专业设计师提供的文本提示的数据集,用于评估文本到视觉生成的组合性。数据集包含由多个领先模型生成的图像和人类评分,涵盖了基础和高级的组合技能。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作