OpenAI-4o_t2i_human_preference
收藏魔搭社区2026-04-18 更新2025-04-05 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/OpenAI-4o_t2i_human_preference
下载链接
链接失效反馈官方服务:
资源简介:
<style>
.vertical-container {
display: flex;
flex-direction: column;
gap: 60px;
}
.horizontal-container {
display: flex;
flex-direction: row;
justify-content: center;
gap: 60px;
}
.image-container img {
max-height: 250px; /* Set the desired height */
margin:0;
object-fit: contain; /* Ensures the aspect ratio is maintained */
width: auto; /* Adjust width automatically based on height */
box-sizing: content-box;
}
.image-container img.big {
max-height: 350px; /* Set the desired height */
}
.image-container {
display: flex; /* Aligns images side by side */
justify-content: space-around; /* Space them evenly */
align-items: center; /* Align them vertically */
gap: .5rem
}
.container {
width: 90%;
margin: 0 auto;
}
.text-center {
text-align: center;
}
.score-amount {
margin: 0;
margin-top: 10px;
}
.score-percentage {Score:
font-size: 12px;
font-weight: semi-bold;
}
</style>
# Rapidata OpenAI 4o Preference
<a href="https://www.rapidata.ai">
<img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/jfxR79bOztqaC6_yNNnGU.jpeg" width="400" alt="Dataset visualization">
</a>
This T2I dataset contains over 200'000 human responses from over ~45,000 individual annotators, collected in less than half a day using the [Rapidata Python API](https://docs.rapidata.ai), accessible to anyone and ideal for large scale evaluation.
Evaluating OpenAI 4o (version from 26.3.2025) across three categories: preference, coherence, and alignment.
Explore our latest model rankings on our [website](https://www.rapidata.ai/benchmark).
If you get value from this dataset and would like to see more in the future, please consider liking it ❤️
## Overview
The evaluation consists of 1v1 comparisons between OpenAI 4o (version from 26.3.2025) and 12 other models: Ideogram V2, Recraft V2, Lumina-15-2-25, Frames-23-1-25, Imagen-3, Flux-1.1-pro, Flux-1-pro, DALL-E 3, Midjourney-5.2, Stable Diffusion 3, Aurora, and Janus-7b.
Below, you'll find key visualizations that highlight how these models compare in terms of prompt alignment and coherence, where OpenAI 4o (version from 26.3.2025) significantly outperforms the other models.
<div style="width: 100%; display: flex; justify-content: center; align-items: center; gap: 20px;">
<div style="width: 90%; max-width: 1000px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/fMf6_uredbYDY7Hzuyk9J.png" style="width: 95%; height: auto; display: block;">
</div>
<div style="width: 90%; max-width: 1000px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/rMjvWjG8HFql65D47TGsZ.png" style="width: 100%; height: auto; display: block;">
</div>
</div>
## Master of Absurd Prompts
The benchmark intentially includes a range of absurd or conflicting prompts that aim to target situations or scenes that are very unlikely to occur in the training data
such as *'A Chair on a cat'* or *'Car is bigger than the airplane.'*. Most other models struggle to adhere to these prompts consistently, but the 4o image generation model
appears to be significantly ahead of the competition in this regard.
<div class="horizontal-container">
<div clas="container">
<div class="text-center">
<q>A chair on a cat.</q>
</div>
<div class="image-container">
<div>
<h3 class="score-amount">OpenAI 4o</h3>
<img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/kS2uE91Q3QAKxR205DxS_.webp" width=300>
</div>
<div>
<h3 class="score-amount">Imagen 3 </h3>
<img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/KKQRsy9xzJVs7QsYyhuzp.jpeg" width=300>
</div>
</div>
</div>
<div clas="container">
<div class="text-center">
<q>Car is bigger than the airplane.</q>
</div>
<div class="image-container">
<div>
<h3 class="score-amount">OpenAI 4o</h3>
<img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/TWSsbPFxVJgaHW0gVCR2a.webp" width=300>
</div>
<div>
<h3 class="score-amount">Flux1.1-pro</h3>
<img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/7w3Ls8a6PmuR1ZR1J72Zk.jpeg" width=300>
</div>
</div>
</div>
</div>
That being said, some of the 'absurd' prompts are still not fully solved.
<div class="horizontal-container">
<div clas="container">
<div class="text-center">
<q>A fish eating a pelican.</q>
</div>
<div class="image-container">
<div>
<h3 class="score-amount">OpenAI 4o</h3>
<img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/xsJ2E_0Kx5gJjIO6C29-Q.webp" width=300>
</div>
<div>
<h3 class="score-amount">Recraft V2</h3>
<img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/R7Zf5dmhvjUTgkBMEx9Ns.webp" width=300>
</div>
</div>
</div>
<div clas="container">
<div class="text-center">
<q>A horse riding an astronaut.</q>
</div>
<div class="image-container">
<div>
<h3 class="score-amount">OpenAI 4o</h3>
<img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/RretHzxWGlXsjD9gXmg2k.webp" width=300>
</div>
<div>
<h3 class="score-amount">Ideogram</h3>
<img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/qtVKN58c0JgCYKK2Xc5bT.png" width=300>
</div>
</div>
</div>
</div>
## Alignment
The alignment score quantifies how well an video matches its prompt. Users were asked: "Which image matches the description better?".
<div class="vertical-container">
<div class="container">
<div class="text-center">
<q>A baseball player in a blue and white uniform is next to a player in black and white .</q>
</div>
<div class="image-container">
<div>
<h3 class="score-amount">OpenAI 4o</h3>
<div class="score-percentage">Score: 100%</div>
<img style="border: 5px solid #18c54f;" src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/pzKcqdCXwVDZi5lwgoeGv.jpeg" width=500>
</div>
<div>
<h3 class="score-amount">Stable Diffusion 3 </h3>
<div class="score-percentage">Score: 0%</div>
<img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/rbxFhkeir8TUTK-vYDn6Q.jpeg" width=500>
</div>
</div>
</div>
<div class="container">
<div class="text-center">
<q>A couple of glasses are sitting on a table.</q>
</div>
<div class="image-container">
<div>
<h3 class="score-amount">OpenAI 4o</h3>
<div class="score-percentage">Score: 2.8%</div>
<img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/AY-I6WqgUF4Eh3thLkAqJ.jpeg" width=500>
</div>
<div>
<h3 class="score-amount">Dalle-3</h3>
<div class="score-percentage">Score: 97.2%</div>
<img style="border: 5px solid #18c54f;" src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/3ygGq2P4dS6rfh5q-x3jb.jpeg" width=500>
</div>
</div>
</div>
</div>
## Coherence
The coherence score measures whether the generated video is logically consistent and free from artifacts or visual glitches. Without seeing the original prompt, users were asked: "Which image has **more** glitches and is **more** likely to be AI generated?"
<div class="vertical-container">
<div class="container">
<div class="image-container">
<div>
<h3 class="score-amount">OpenAI 4o </h3>
<div class="score-percentage">Glitch Rating: 0%</div>
<img style="border: 5px solid #18c54f;" src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/DzuAiklD3R_pwe-yFtRM7.jpeg" width=500>
</div>
<div>
<h3 class="score-amount">Lumina-15-2-25 </h3>
<div class="score-percentage">Glitch Rating: 100%</div>
<img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/iAn4zphOEL_cpOorp0JNZ.jpeg" width=500>
</div>
</div>
</div>
<div class="container">
<div class="image-container">
<div>
<h3 class="score-amount">OpenAI 4o </h3>
<div class="score-percentage">Glitch Rating: 98.6%</div>
<img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/IeJHwzNc77tjVAKf8nGEk.jpeg" width=500>
</div>
<div>
<h3 class="score-amount">Recraft V2</h3>
<div class="score-percentage">Glitch Rating: 1.4%</div>
<img style="border: 5px solid #18c54f;" src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/iCuVaPrVGbDeLHuqbMgkc.jpeg" width=500>
</div>
</div>
</div>
</div>
## Preference
The preference score reflects how visually appealing participants found each image, independent of the prompt. Users were asked: "Which image do you prefer?"
<div class="vertical-container">
<div class="container">
<div class="image-container">
<div>
<h3 class="score-amount">OpenAI 4o</h3>
<div class="score-percentage">Score: 100%</div>
<img style="border: 5px solid #18c54f;" src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/ve4DVzU0kZznjA9N0AdkO.jpeg" width=500>
</div>
<div>
<h3 class="score-amount">Lumina-15-2-25</h3>
<div class="score-percentage">Score: 0%</div>
<img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/zTZRillcEV85C9gfLa25L.jpeg" width=500>
</div>
</div>
</div>
<div class="container">
<div class="image-container">
<div>
<h3 class="score-amount">OpenAI 4o </h3>
<div class="score-percentage">Score: 0%</div>
<img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/0EmcYSDQeseS1XSWyG-lb.jpeg" width=500>
</div>
<div>
<h3 class="score-amount">Flux-1.1 Pro </h3>
<div class="score-percentage">Score: 100%</div>
<img style="border: 5px solid #18c54f;" src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/MO7RnVUWC0gR84PIKDuyI.jpeg" width=500>
</div>
</div>
</div>
</div>
## About Rapidata
Rapidata's technology makes collecting human feedback at scale faster and more accessible than ever before. Visit [rapidata.ai](https://www.rapidata.ai/) to learn more about how we're revolutionizing human feedback collection for AI development.
# Rapidata OpenAI 4o 偏好数据集
<a href="https://www.rapidata.ai">
<img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/jfxR79bOztqaC6_yNNnGU.jpeg" width="400" alt="数据集可视化效果">
</a>
本**文生图(Text-to-Image, T2I)数据集**包含超20万条人类标注反馈,招募了约4.5万名独立标注员,仅耗时半天便通过[Rapidata Python API(Rapidata Python应用程序编程接口)](https://docs.rapidata.ai)完成数据采集,所有用户均可访问,非常适合大规模模型评估。
本数据集针对OpenAI 4o(2025年3月26日发布版本)从偏好性、一致性与对齐性三个维度展开评估。
可前往我们的[官网](https://www.rapidata.ai/benchmark)查看最新的模型排名榜单。
若本数据集对你的研究有所助益并希望后续获取更多同类资源,欢迎为其点赞❤️
## 数据集概览
本次评估采用1v1对比形式,将OpenAI 4o(2025年3月26日发布版本)与其余12款模型进行对标,包括:Ideogram V2、Recraft V2、Lumina-15-2-25、Frames-23-1-25、Imagen-3、Flux-1.1-pro、Flux-1-pro、DALL-E 3、Midjourney-5.2、Stable Diffusion 3、Aurora以及Janus-7b。
下文将展示关键可视化结果,直观呈现各模型在提示词对齐性与逻辑一致性上的表现,其中OpenAI 4o(2025年3月26日发布版本)在两项指标上均显著领先于竞品。
<div style="width: 100%; display: flex; justify-content: center; align-items: center; gap: 20px;">
<div style="width: 90%; max-width: 1000px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/fMf6_uredbYDY7Hzuyk9J.png" style="width: 95%; height: auto; display: block;">
</div>
<div style="width: 90%; max-width: 1000px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/rMjvWjG8HFql65D47TGsZ.png" style="width: 100%; height: auto; display: block;">
</div>
</div>
## 荒诞提示词大师
本次基准测试特意引入了一系列荒诞或冲突性提示词,旨在模拟训练数据中极难出现的场景,例如*“猫咪背上的椅子”*或*“汽车比飞机更大”*。绝大多数竞品模型均难以稳定遵循此类提示词生成结果,但OpenAI 4o图像生成模型在该维度上大幅领先于其他对手。
<div class="horizontal-container">
<div clas="container">
<div class="text-center">
<q>A chair on a cat.</q>
</div>
<div class="image-container">
<div>
<h3 class="score-amount">OpenAI 4o</h3>
<img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/kS2uE91Q3QAKxR205DxS_.webp" width=300>
</div>
<div>
<h3 class="score-amount">Imagen 3 </h3>
<img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/KKQRsy9xzJVs7QsYyhuzp.jpeg" width=300>
</div>
</div>
</div>
<div clas="container">
<div class="text-center">
<q>Car is bigger than the airplane.</q>
</div>
<div class="image-container">
<div>
<h3 class="score-amount">OpenAI 4o</h3>
<img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/TWSsbPFxVJgaHW0gVCR2a.webp" width=300>
</div>
<div>
<h3 class="score-amount">Flux1.1-pro</h3>
<img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/7w3Ls8a6PmuR1ZR1J72Zk.jpeg" width=300>
</div>
</div>
</div>
</div>
即便如此,部分“荒诞”提示词仍未被完全解决。
<div class="horizontal-container">
<div clas="container">
<div class="text-center">
<q>A fish eating a pelican.</q>
</div>
<div class="image-container">
<div>
<h3 class="score-amount">OpenAI 4o</h3>
<img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/xsJ2E_0Kx5gJjIO6C29-Q.webp" width=300>
</div>
<div>
<h3 class="score-amount">Recraft V2</h3>
<img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/R7Zf5dmhvjUTgkBMEx9Ns.webp" width=300>
</div>
</div>
</div>
<div clas="container">
<div class="text-center">
<q>A horse riding an astronaut.</q>
</div>
<div class="image-container">
<div>
<h3 class="score-amount">OpenAI 4o</h3>
<img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/RretHzxWGlXsjD9gXmg2k.webp" width=300>
</div>
<div>
<h3 class="score-amount">Ideogram</h3>
<img class="big" src="https://cdn-uploads.huggingface.co/production/uploads/6710d82fd3a72fc574ea620f/qtVKN58c0JgCYKK2Xc5bT.png" width=300>
</div>
</div>
</div>
</div>
## 对齐性
对齐性评分用于量化生成图像与对应提示词的匹配程度。标注人员需回答:“哪张图片更贴合描述内容?”
<div class="vertical-container">
<div class="container">
<div class="text-center">
<q>A baseball player in a blue and white uniform is next to a player in black and white .</q>
</div>
<div class="image-container">
<div>
<h3 class="score-amount">OpenAI 4o</h3>
<div class="score-percentage">Score: 100%</div>
<img style="border: 5px solid #18c54f;" src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/pzKcqdCXwVDZi5lwgoeGv.jpeg" width=500>
</div>
<div>
<h3 class="score-amount">Stable Diffusion 3 </h3>
<div class="score-percentage">Score: 0%</div>
<img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/rbxFhkeir8TUTK-vYDn6Q.jpeg" width=500>
</div>
</div>
</div>
<div class="container">
<div class="text-center">
<q>A couple of glasses are sitting on a table.</q>
</div>
<div class="image-container">
<div>
<h3 class="score-amount">OpenAI 4o</h3>
<div class="score-percentage">Score: 2.8%</div>
<img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/AY-I6WqgUF4Eh3thLkAqJ.jpeg" width=500>
</div>
<div>
<h3 class="score-amount">Dalle-3</h3>
<div class="score-percentage">Score: 97.2%</div>
<img style="border: 5px solid #18c54f;" src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/3ygGq2P4dS6rfh5q-x3jb.jpeg" width=500>
</div>
</div>
</div>
</div>
## 一致性
一致性评分用于衡量生成图像的逻辑自洽性,以及是否存在视觉伪影或视觉瑕疵。在不展示原始提示词的前提下,标注人员需回答:“哪张图片存在更多瑕疵,更有可能是AI生成的?”
<div class="vertical-container">
<div class="container">
<div class="image-container">
<div>
<h3 class="score-amount">OpenAI 4o </h3>
<div class="score-percentage">Glitch Rating: 0%</div>
<img style="border: 5px solid #18c54f;" src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/DzuAiklD3R_pwe-yFtRM7.jpeg" width=500>
</div>
<div>
<h3 class="score-amount">Lumina-15-2-25 </h3>
<div class="score-percentage">Glitch Rating: 100%</div>
<img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/iAn4zphOEL_cpOorp0JNZ.jpeg" width=500>
</div>
</div>
</div>
<div class="container">
<div class="image-container">
<div>
<h3 class="score-amount">OpenAI 4o </h3>
<div class="score-percentage">Glitch Rating: 98.6%</div>
<img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/IeJHwzNc77tjVAKf8nGEk.jpeg" width=500>
</div>
<div>
<h3 class="score-amount">Recraft V2</h3>
<div class="score-percentage">Glitch Rating: 1.4%</div>
<img style="border: 5px solid #18c54f;" src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/iCuVaPrVGbDeLHuqbMgkc.jpeg" width=500>
</div>
</div>
</div>
</div>
## 偏好性
偏好性评分反映了参与者对图像视觉美观度的主观偏好,与提示词无关。标注人员需回答:“你更偏好哪张图片?”
<div class="vertical-container">
<div class="container">
<div class="image-container">
<div>
<h3 class="score-amount">OpenAI 4o</h3>
<div class="score-percentage">Score: 100%</div>
<img style="border: 5px solid #18c54f;" src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/ve4DVzU0kZznjA9N0AdkO.jpeg" width=500>
</div>
<div>
<h3 class="score-amount">Lumina-15-2-25</h3>
<div class="score-percentage">Score: 0%</div>
<img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/zTZRillcEV85C9gfLa25L.jpeg" width=500>
</div>
</div>
</div>
<div class="container">
<div class="image-container">
<div>
<h3 class="score-amount">OpenAI 4o </h3>
<div class="score-percentage">Score: 0%</div>
<img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/0EmcYSDQeseS1XSWyG-lb.jpeg" width=500>
</div>
<div>
<h3 class="score-amount">Flux-1.1 Pro </h3>
<div class="score-percentage">Score: 100%</div>
<img style="border: 5px solid #18c54f;" src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/MO7RnVUWC0gR84PIKDuyI.jpeg" width=500>
</div>
</div>
</div>
</div>
## 关于Rapidata
Rapidata的技术让大规模人类反馈采集变得比以往更快、更易用。访问[rapidata.ai](https://www.rapidata.ai/)了解更多关于我们如何革新AI开发中的人类反馈采集技术的详情。
提供机构:
maas
创建时间:
2025-04-01
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



