five

OpenAI-4o_t2i_human_preference

收藏
魔搭社区2026-04-18 更新2025-04-05 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/OpenAI-4o_t2i_human_preference
下载链接
链接失效反馈
官方服务:
资源简介:
<style> .vertical-container { display: flex; flex-direction: column; gap: 60px; } .horizontal-container { display: flex; flex-direction: row; justify-content: center; gap: 60px; } .image-container img { max-height: 250px; /* Set the desired height */ margin:0; object-fit: contain; /* Ensures the aspect ratio is maintained */ width: auto; /* Adjust width automatically based on height */ box-sizing: content-box; } .image-container img.big { max-height: 350px; /* Set the desired height */ } .image-container { display: flex; /* Aligns images side by side */ justify-content: space-around; /* Space them evenly */ align-items: center; /* Align them vertically */ gap: .5rem } .container { width: 90%; margin: 0 auto; } .text-center { text-align: center; } .score-amount { margin: 0; margin-top: 10px; } .score-percentage {Score: font-size: 12px; font-weight: semi-bold; } </style> # Rapidata OpenAI 4o Preference <a href="https://www.rapidata.ai"> <img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/jfxR79bOztqaC6_yNNnGU.jpeg" width="400" alt="Dataset visualization"> </a> This T2I dataset contains over 200'000 human responses from over ~45,000 individual annotators, collected in less than half a day using the [Rapidata Python API](https://docs.rapidata.ai), accessible to anyone and ideal for large scale evaluation. Evaluating OpenAI 4o (version from 26.3.2025) across three categories: preference, coherence, and alignment. Explore our latest model rankings on our [website](https://www.rapidata.ai/benchmark). If you get value from this dataset and would like to see more in the future, please consider liking it ❤️ ## Overview The evaluation consists of 1v1 comparisons between OpenAI 4o (version from 26.3.2025) and 12 other models: Ideogram V2, Recraft V2, Lumina-15-2-25, Frames-23-1-25, Imagen-3, Flux-1.1-pro, Flux-1-pro, DALL-E 3, Midjourney-5.2, Stable Diffusion 3, Aurora, and Janus-7b. Below, you'll find key visualizations that highlight how these models compare in terms of prompt alignment and coherence, where OpenAI 4o (version from 26.3.2025) significantly outperforms the other models. <div style="width: 100%; display: flex; justify-content: center; align-items: center; gap: 20px;"> <div style="width: 90%; max-width: 1000px;"> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/fMf6_uredbYDY7Hzuyk9J.png" style="width: 95%; height: auto; display: block;"> </div> <div style="width: 90%; max-width: 1000px;"> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/rMjvWjG8HFql65D47TGsZ.png" style="width: 100%; height: auto; display: block;"> </div> </div> ## Master of Absurd Prompts The benchmark intentially includes a range of absurd or conflicting prompts that aim to target situations or scenes that are very unlikely to occur in the training data such as *'A Chair on a cat'* or *'Car is bigger than the airplane.'*. Most other models struggle to adhere to these prompts consistently, but the 4o image generation model appears to be significantly ahead of the competition in this regard. <div class="horizontal-container"> <div clas="container"> <div class="text-center"> <q>A chair on a cat.</q> </div> <div class="image-container"> <div> <h3 class="score-amount">OpenAI 4o</h3> <img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/kS2uE91Q3QAKxR205DxS_.webp" width=300> </div> <div> <h3 class="score-amount">Imagen 3 </h3> <img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/KKQRsy9xzJVs7QsYyhuzp.jpeg" width=300> </div> </div> </div> <div clas="container"> <div class="text-center"> <q>Car is bigger than the airplane.</q> </div> <div class="image-container"> <div> <h3 class="score-amount">OpenAI 4o</h3> <img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/TWSsbPFxVJgaHW0gVCR2a.webp" width=300> </div> <div> <h3 class="score-amount">Flux1.1-pro</h3> <img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/7w3Ls8a6PmuR1ZR1J72Zk.jpeg" width=300> </div> </div> </div> </div> That being said, some of the 'absurd' prompts are still not fully solved. <div class="horizontal-container"> <div clas="container"> <div class="text-center"> <q>A fish eating a pelican.</q> </div> <div class="image-container"> <div> <h3 class="score-amount">OpenAI 4o</h3> <img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/xsJ2E_0Kx5gJjIO6C29-Q.webp" width=300> </div> <div> <h3 class="score-amount">Recraft V2</h3> <img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/R7Zf5dmhvjUTgkBMEx9Ns.webp" width=300> </div> </div> </div> <div clas="container"> <div class="text-center"> <q>A horse riding an astronaut.</q> </div> <div class="image-container"> <div> <h3 class="score-amount">OpenAI 4o</h3> <img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/RretHzxWGlXsjD9gXmg2k.webp" width=300> </div> <div> <h3 class="score-amount">Ideogram</h3> <img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/qtVKN58c0JgCYKK2Xc5bT.png" width=300> </div> </div> </div> </div> ## Alignment The alignment score quantifies how well an video matches its prompt. Users were asked: "Which image matches the description better?". <div class="vertical-container"> <div class="container"> <div class="text-center"> <q>A baseball player in a blue and white uniform is next to a player in black and white .</q> </div> <div class="image-container"> <div> <h3 class="score-amount">OpenAI 4o</h3> <div class="score-percentage">Score: 100%</div> <img style="border: 5px solid #18c54f;" src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/pzKcqdCXwVDZi5lwgoeGv.jpeg" width=500> </div> <div> <h3 class="score-amount">Stable Diffusion 3 </h3> <div class="score-percentage">Score: 0%</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/rbxFhkeir8TUTK-vYDn6Q.jpeg" width=500> </div> </div> </div> <div class="container"> <div class="text-center"> <q>A couple of glasses are sitting on a table.</q> </div> <div class="image-container"> <div> <h3 class="score-amount">OpenAI 4o</h3> <div class="score-percentage">Score: 2.8%</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/AY-I6WqgUF4Eh3thLkAqJ.jpeg" width=500> </div> <div> <h3 class="score-amount">Dalle-3</h3> <div class="score-percentage">Score: 97.2%</div> <img style="border: 5px solid #18c54f;" src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/3ygGq2P4dS6rfh5q-x3jb.jpeg" width=500> </div> </div> </div> </div> ## Coherence The coherence score measures whether the generated video is logically consistent and free from artifacts or visual glitches. Without seeing the original prompt, users were asked: "Which image has **more** glitches and is **more** likely to be AI generated?" <div class="vertical-container"> <div class="container"> <div class="image-container"> <div> <h3 class="score-amount">OpenAI 4o </h3> <div class="score-percentage">Glitch Rating: 0%</div> <img style="border: 5px solid #18c54f;" src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/DzuAiklD3R_pwe-yFtRM7.jpeg" width=500> </div> <div> <h3 class="score-amount">Lumina-15-2-25 </h3> <div class="score-percentage">Glitch Rating: 100%</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/iAn4zphOEL_cpOorp0JNZ.jpeg" width=500> </div> </div> </div> <div class="container"> <div class="image-container"> <div> <h3 class="score-amount">OpenAI 4o </h3> <div class="score-percentage">Glitch Rating: 98.6%</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/IeJHwzNc77tjVAKf8nGEk.jpeg" width=500> </div> <div> <h3 class="score-amount">Recraft V2</h3> <div class="score-percentage">Glitch Rating: 1.4%</div> <img style="border: 5px solid #18c54f;" src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/iCuVaPrVGbDeLHuqbMgkc.jpeg" width=500> </div> </div> </div> </div> ## Preference The preference score reflects how visually appealing participants found each image, independent of the prompt. Users were asked: "Which image do you prefer?" <div class="vertical-container"> <div class="container"> <div class="image-container"> <div> <h3 class="score-amount">OpenAI 4o</h3> <div class="score-percentage">Score: 100%</div> <img style="border: 5px solid #18c54f;" src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/ve4DVzU0kZznjA9N0AdkO.jpeg" width=500> </div> <div> <h3 class="score-amount">Lumina-15-2-25</h3> <div class="score-percentage">Score: 0%</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/zTZRillcEV85C9gfLa25L.jpeg" width=500> </div> </div> </div> <div class="container"> <div class="image-container"> <div> <h3 class="score-amount">OpenAI 4o </h3> <div class="score-percentage">Score: 0%</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/0EmcYSDQeseS1XSWyG-lb.jpeg" width=500> </div> <div> <h3 class="score-amount">Flux-1.1 Pro </h3> <div class="score-percentage">Score: 100%</div> <img style="border: 5px solid #18c54f;" src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/MO7RnVUWC0gR84PIKDuyI.jpeg" width=500> </div> </div> </div> </div> ## About Rapidata Rapidata's technology makes collecting human feedback at scale faster and more accessible than ever before. Visit [rapidata.ai](https://www.rapidata.ai/) to learn more about how we're revolutionizing human feedback collection for AI development.

# Rapidata OpenAI 4o 偏好数据集 <a href="https://www.rapidata.ai"> <img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/jfxR79bOztqaC6_yNNnGU.jpeg" width="400" alt="数据集可视化效果"> </a> 本**文生图(Text-to-Image, T2I)数据集**包含超20万条人类标注反馈,招募了约4.5万名独立标注员,仅耗时半天便通过[Rapidata Python API(Rapidata Python应用程序编程接口)](https://docs.rapidata.ai)完成数据采集,所有用户均可访问,非常适合大规模模型评估。 本数据集针对OpenAI 4o(2025年3月26日发布版本)从偏好性、一致性与对齐性三个维度展开评估。 可前往我们的[官网](https://www.rapidata.ai/benchmark)查看最新的模型排名榜单。 若本数据集对你的研究有所助益并希望后续获取更多同类资源,欢迎为其点赞❤️ ## 数据集概览 本次评估采用1v1对比形式,将OpenAI 4o(2025年3月26日发布版本)与其余12款模型进行对标,包括:Ideogram V2、Recraft V2、Lumina-15-2-25、Frames-23-1-25、Imagen-3、Flux-1.1-pro、Flux-1-pro、DALL-E 3、Midjourney-5.2、Stable Diffusion 3、Aurora以及Janus-7b。 下文将展示关键可视化结果,直观呈现各模型在提示词对齐性与逻辑一致性上的表现,其中OpenAI 4o(2025年3月26日发布版本)在两项指标上均显著领先于竞品。 <div style="width: 100%; display: flex; justify-content: center; align-items: center; gap: 20px;"> <div style="width: 90%; max-width: 1000px;"> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/fMf6_uredbYDY7Hzuyk9J.png" style="width: 95%; height: auto; display: block;"> </div> <div style="width: 90%; max-width: 1000px;"> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/rMjvWjG8HFql65D47TGsZ.png" style="width: 100%; height: auto; display: block;"> </div> </div> ## 荒诞提示词大师 本次基准测试特意引入了一系列荒诞或冲突性提示词,旨在模拟训练数据中极难出现的场景,例如*“猫咪背上的椅子”*或*“汽车比飞机更大”*。绝大多数竞品模型均难以稳定遵循此类提示词生成结果,但OpenAI 4o图像生成模型在该维度上大幅领先于其他对手。 <div class="horizontal-container"> <div clas="container"> <div class="text-center"> <q>A chair on a cat.</q> </div> <div class="image-container"> <div> <h3 class="score-amount">OpenAI 4o</h3> <img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/kS2uE91Q3QAKxR205DxS_.webp" width=300> </div> <div> <h3 class="score-amount">Imagen 3 </h3> <img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/KKQRsy9xzJVs7QsYyhuzp.jpeg" width=300> </div> </div> </div> <div clas="container"> <div class="text-center"> <q>Car is bigger than the airplane.</q> </div> <div class="image-container"> <div> <h3 class="score-amount">OpenAI 4o</h3> <img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/TWSsbPFxVJgaHW0gVCR2a.webp" width=300> </div> <div> <h3 class="score-amount">Flux1.1-pro</h3> <img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/7w3Ls8a6PmuR1ZR1J72Zk.jpeg" width=300> </div> </div> </div> </div> 即便如此,部分“荒诞”提示词仍未被完全解决。 <div class="horizontal-container"> <div clas="container"> <div class="text-center"> <q>A fish eating a pelican.</q> </div> <div class="image-container"> <div> <h3 class="score-amount">OpenAI 4o</h3> <img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/xsJ2E_0Kx5gJjIO6C29-Q.webp" width=300> </div> <div> <h3 class="score-amount">Recraft V2</h3> <img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/R7Zf5dmhvjUTgkBMEx9Ns.webp" width=300> </div> </div> </div> <div clas="container"> <div class="text-center"> <q>A horse riding an astronaut.</q> </div> <div class="image-container"> <div> <h3 class="score-amount">OpenAI 4o</h3> <img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/RretHzxWGlXsjD9gXmg2k.webp" width=300> </div> <div> <h3 class="score-amount">Ideogram</h3> <img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/qtVKN58c0JgCYKK2Xc5bT.png" width=300> </div> </div> </div> </div> ## 对齐性 对齐性评分用于量化生成图像与对应提示词的匹配程度。标注人员需回答:“哪张图片更贴合描述内容?” <div class="vertical-container"> <div class="container"> <div class="text-center"> <q>A baseball player in a blue and white uniform is next to a player in black and white .</q> </div> <div class="image-container"> <div> <h3 class="score-amount">OpenAI 4o</h3> <div class="score-percentage">Score: 100%</div> <img style="border: 5px solid #18c54f;" src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/pzKcqdCXwVDZi5lwgoeGv.jpeg" width=500> </div> <div> <h3 class="score-amount">Stable Diffusion 3 </h3> <div class="score-percentage">Score: 0%</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/rbxFhkeir8TUTK-vYDn6Q.jpeg" width=500> </div> </div> </div> <div class="container"> <div class="text-center"> <q>A couple of glasses are sitting on a table.</q> </div> <div class="image-container"> <div> <h3 class="score-amount">OpenAI 4o</h3> <div class="score-percentage">Score: 2.8%</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/AY-I6WqgUF4Eh3thLkAqJ.jpeg" width=500> </div> <div> <h3 class="score-amount">Dalle-3</h3> <div class="score-percentage">Score: 97.2%</div> <img style="border: 5px solid #18c54f;" src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/3ygGq2P4dS6rfh5q-x3jb.jpeg" width=500> </div> </div> </div> </div> ## 一致性 一致性评分用于衡量生成图像的逻辑自洽性,以及是否存在视觉伪影或视觉瑕疵。在不展示原始提示词的前提下,标注人员需回答:“哪张图片存在更多瑕疵,更有可能是AI生成的?” <div class="vertical-container"> <div class="container"> <div class="image-container"> <div> <h3 class="score-amount">OpenAI 4o </h3> <div class="score-percentage">Glitch Rating: 0%</div> <img style="border: 5px solid #18c54f;" src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/DzuAiklD3R_pwe-yFtRM7.jpeg" width=500> </div> <div> <h3 class="score-amount">Lumina-15-2-25 </h3> <div class="score-percentage">Glitch Rating: 100%</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/iAn4zphOEL_cpOorp0JNZ.jpeg" width=500> </div> </div> </div> <div class="container"> <div class="image-container"> <div> <h3 class="score-amount">OpenAI 4o </h3> <div class="score-percentage">Glitch Rating: 98.6%</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/IeJHwzNc77tjVAKf8nGEk.jpeg" width=500> </div> <div> <h3 class="score-amount">Recraft V2</h3> <div class="score-percentage">Glitch Rating: 1.4%</div> <img style="border: 5px solid #18c54f;" src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/iCuVaPrVGbDeLHuqbMgkc.jpeg" width=500> </div> </div> </div> </div> ## 偏好性 偏好性评分反映了参与者对图像视觉美观度的主观偏好,与提示词无关。标注人员需回答:“你更偏好哪张图片?” <div class="vertical-container"> <div class="container"> <div class="image-container"> <div> <h3 class="score-amount">OpenAI 4o</h3> <div class="score-percentage">Score: 100%</div> <img style="border: 5px solid #18c54f;" src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/ve4DVzU0kZznjA9N0AdkO.jpeg" width=500> </div> <div> <h3 class="score-amount">Lumina-15-2-25</h3> <div class="score-percentage">Score: 0%</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/zTZRillcEV85C9gfLa25L.jpeg" width=500> </div> </div> </div> <div class="container"> <div class="image-container"> <div> <h3 class="score-amount">OpenAI 4o </h3> <div class="score-percentage">Score: 0%</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/0EmcYSDQeseS1XSWyG-lb.jpeg" width=500> </div> <div> <h3 class="score-amount">Flux-1.1 Pro </h3> <div class="score-percentage">Score: 100%</div> <img style="border: 5px solid #18c54f;" src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/MO7RnVUWC0gR84PIKDuyI.jpeg" width=500> </div> </div> </div> </div> ## 关于Rapidata Rapidata的技术让大规模人类反馈采集变得比以往更快、更易用。访问[rapidata.ai](https://www.rapidata.ai/)了解更多关于我们如何革新AI开发中的人类反馈采集技术的详情。
提供机构:
maas
创建时间:
2025-04-01
搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作