OpenGVLab_Lumina_t2i_human_preference
收藏魔搭社区2025-10-09 更新2025-03-01 收录
下载链接:
https://modelscope.cn/datasets/Rapidata/OpenGVLab_Lumina_t2i_human_preference
下载链接
链接失效反馈官方服务:
资源简介:
.vertical-container {
display: flex;
flex-direction: column;
gap: 60px;
}
.image-container img {
max-height: 250px; /* Set the desired height */
margin:0;
object-fit: contain; /* Ensures the aspect ratio is maintained */
width: auto; /* Adjust width automatically based on height */
box-sizing: content-box;
}
.image-container {
display: flex; /* Aligns images side by side */
justify-content: space-around; /* Space them evenly */
align-items: center; /* Align them vertically */
gap: .5rem
}
.container {
width: 90%;
margin: 0 auto;
}
.text-center {
text-align: center;
}
.score-amount {
margin: 0;
margin-top: 10px;
}
.score-percentage {Score:
font-size: 12px;
font-weight: semi-bold;
}
# Rapidata Lumina Preference
This T2I dataset contains over 400k human responses from over 86k individual annotators, collected in just ~2 Days using the [Rapidata Python API](https://docs.rapidata.ai), accessible to anyone and ideal for large scale evaluation.
Evaluating Lumina across three categories: preference, coherence, and alignment.
Explore our latest model rankings on our [website](https://www.rapidata.ai/benchmark).
If you get value from this dataset and would like to see more in the future, please consider liking it.
## Overview
This T2I dataset contains over 400k human responses from over 86k individual annotators, collected in just ~2 Days.
Evaluating OpenGVLab's Lumina across three categories: preference, coherence, and alignment.
The evaluation consists of 1v1 comparisons between Lumina-15-2-25 and eight other models: Imagen-3, Flux-1.1-pro, Flux-1-pro, DALL-E 3, Midjourney-5.2, Stable Diffusion 3, Aurora and Janus-7b.
## Data collection
Since Lumina is not available through an API, the images were collected manually through the user interface. The date following each model name indicates when the images were generated.
## Alignment
The alignment score quantifies how well an video matches its prompt. Users were asked: "Which image matches the description better?".
A chair on the left of a cat and on a airplane.
Lumina-15-2-25
Score: 100%
Janus-7b
Score: 0%
A brown toilet with a white wooden seat.
Lumina-15-2-25
Score: 0%
Flux-1
Score: 100%
## Coherence
The coherence score measures whether the generated video is logically consistent and free from artifacts or visual glitches. Without seeing the original prompt, users were asked: "Which image feels less weird or unnatural when you look closely? I.e., has fewer strange-looking visual errors or glitches?"
Lumina-15-2-25
Score: 100%
Stabel-Diffusion-3
Score: 0%
Lumina-15-2-25
Score: 0%
Aurora
Score: 100%
## Preference
The preference score reflects how visually appealing participants found each image, independent of the prompt. Users were asked: "Which image do you prefer?"
Lumina-15-2-25
Score: 100%
Janus-7b
Score: 0%
Lumina-15-2-25
Score: 0%
Dalle-3
Score: 100%
## About Rapidata
Rapidata's technology makes collecting human feedback at scale faster and more accessible than ever before. Visit [rapidata.ai](https://www.rapidata.ai/) to learn more about how we're revolutionizing human feedback collection for AI development.
# Rapidata Lumina 偏好评测数据集
本文本生成图像(Text-to-Image, T2I)数据集收录了来自8.6万余名独立标注者的40余万条人类标注反馈,仅用约2天时间通过[Rapidata Python API](https://docs.rapidata.ai)完成采集,全量开放且适配大规模模型评测场景。
本次评测围绕偏好性、连贯性与对齐性三大维度展开。
可访问我们的[官方网站](https://www.rapidata.ai/benchmark)查看最新的模型排名榜单。
若本数据集对您的研究有所助益并希望后续获取更多同类资源,欢迎点赞支持。
## 概述
本文本生成图像数据集收录了来自8.6万余名独立标注者的40余万条人类标注反馈,采集周期仅约2天。本次评测针对OpenGVLab研发的Lumina模型,从偏好性、连贯性与对齐性三大维度展开。
评测采用1v1对比模式,对比对象为Lumina-15-2-25与其余8款模型:Imagen-3、Flux-1.1-pro、Flux-1-pro、DALL-E 3、Midjourney-5.2、Stable Diffusion 3、Aurora以及Janus-7b。
## 数据采集
由于Lumina模型未开放API接口,所有图像均通过各模型的官方用户界面手动采集。模型名称后的日期标注了对应图像的生成时间。
## 对齐性
对齐性评分用于量化生成图像与输入文本提示词的匹配程度。标注者需回答:「哪张图像更贴合给定的文本描述?」
### 示例
1. 提示词:一只猫左侧、一架飞机上的椅子
- Lumina-15-2-25:得分100%
- Janus-7b:得分0%
2. 提示词:带有白色木质座圈的棕色马桶
- Lumina-15-2-25:得分0%
- Flux-1:得分100%
## 连贯性
连贯性评分用于衡量生成图像的逻辑自洽性与视觉瑕疵(visual artifacts)程度,即图像是否存在不合理的视觉错误或失真问题。标注者在未查看原始提示词的前提下需回答:「仔细观察后,哪张图像显得更不怪异、更自然?换言之,其视觉异常与失真问题更少?」
### 示例
1.
- Lumina-15-2-25:得分100%
- Stable Diffusion 3:得分0%
2.
- Lumina-15-2-25:得分0%
- Aurora:得分100%
## 偏好性
偏好性评分用于衡量标注者对图像视觉效果的主观喜好程度,不受原始提示词约束。标注者需回答:「您更偏好哪张图像?」
### 示例
1.
- Lumina-15-2-25:得分100%
- Janus-7b:得分0%
2.
- Lumina-15-2-25:得分0%
- DALL-E 3:得分100%
## 关于Rapidata
Rapidata的技术方案让大规模人类反馈采集变得比以往任何时候都更快捷、更易获取。访问[rapidata.ai](https://www.rapidata.ai/)了解更多关于我们如何革新AI开发中的人类反馈采集流程的信息。
提供机构:
maas
创建时间:
2025-02-27



