Flux_SD3_MJ_Dalle_Human_Alignment_Dataset
收藏魔搭社区2025-12-05 更新2025-02-01 收录
下载链接:
https://modelscope.cn/datasets/Rapidata/Flux_SD3_MJ_Dalle_Human_Alignment_Dataset
下载链接
链接失效反馈官方服务:
资源简介:
## **NOTE:** A newer version of this dataset is available [Imagen3_Flux1.1_Flux1_SD3_MJ_Dalle_Human_Alignment_Dataset](https://huggingface.co/datasets/Rapidata/Imagen3_Flux1.1_Flux1_SD3_MJ_Dalle_Human_Alignment_Dataset)
# Rapidata Image Generation Alignment Dataset
<a href="https://www.rapidata.ai">
<img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/jfxR79bOztqaC6_yNNnGU.jpeg" width="400" alt="Dataset visualization">
</a>
This Dataset is a 1/3 of a 2M+ human annotation dataset that was split into three modalities: Preference, Coherence, Text-to-Image Alignment.
- Link to the Coherence dataset: https://huggingface.co/datasets/Rapidata/Flux_SD3_MJ_Dalle_Human_Coherence_Dataset
- Link to the Preference dataset: https://huggingface.co/datasets/Rapidata/700k_Human_Preference_Dataset_FLUX_SD3_MJ_DALLE3
It was collected in ~2 Days using the Rapidata Python API https://docs.rapidata.ai
If you get value from this dataset and would like to see more in the future, please consider liking it.
## Overview
One of the largest human annoatated alignment datasets for text-to-image models, this release contains over 700,000 human preference votes - one third of our complete 2 million vote collection. This preference dataset is part of a larger evaluation comparing images from leading AI models including Flux.1, DALL-E 3, MidJourney, and Stable Diffusion. The complete collection includes two additional datasets of equal size focusing on image coherence and text-image alignment, available on our profile. This extensive dataset was collected in just 2 days using Rapidata's groundbreaking annotation technology, demonstrating unprecedented efficiency in large-scale human feedback collection.
Explore our latest model rankings on our [website](https://www.rapidata.ai/benchmark).
## Key Features
- **Massive Scale**: 700,000+ individual human preference votes collected in 48 hours
- **Global Representation**: Collected from 144,292 participants across 145 countries
- **Diverse Prompts**: 282 carefully curated prompts testing various aspects of image generation
- **Leading Models**: Comparisons between four state-of-the-art image generation models
- **Rigorous Methodology**: Uses pairwise comparisons with built-in quality controls
- **Rich Demographic Data**: Includes annotator information about age, gender, and geographic location
<img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/1LVQj_G5bFio7w4WXPxsC.png" alt="Image description" width="650">
**Figure:** Overview of the distribution of annotators by continent (left) compared to the world population distribution(right)
## Applications
This dataset is invaluable for:
- Training and fine-tuning image generation models
- Understanding global preferences in AI-generated imagery
- Developing better evaluation metrics for generative models
- Researching cross-cultural aesthetic preferences
- Benchmarking new image generation models
## Data Collection Powered by Rapidata
What traditionally would take weeks or months of data collection was accomplished in just 48 hours through Rapidata's innovative annotation platform. Our technology enables:
- Lightning-fast data collection at massive scale
- Global reach across 145+ countries
- Built-in quality assurance mechanisms
- Comprehensive demographic representation
- Cost-effective large-scale annotation
## Citation
If you use this dataset in your research, please cite our Startup Rapidata and our paper: "Finding the Subjective Truth: Collecting 2 Million Votes for Comprehensive Gen-AI Model Evaluation" (arXiv:2409.11904v2)
```
@misc{christodoulou2024findingsubjectivetruthcollecting,
title={Finding the Subjective Truth: Collecting 2 Million Votes for Comprehensive Gen-AI Model Evaluation},
author={Dimitrios Christodoulou and Mads Kuhlmann-Jørgensen},
year={2024},
eprint={2409.11904},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2409.11904},
}
```
## About Rapidata
Rapidata's technology makes collecting human feedback at scale faster and more accessible than ever before. Visit [rapidata.ai](https://www.rapidata.ai/) to learn more about how we're revolutionizing human feedback collection for AI development.
We created the dataset using our in-house developed [API](https://docs.rapidata.ai/), which you can access to gain near-instant human intelligence at your fingertips.
**注意:本数据集的更新版本已上线,链接为[Imagen3_Flux1.1_Flux1_SD3_MJ_Dalle_Human_Alignment_Dataset](https://huggingface.co/datasets/Rapidata/Imagen3_Flux1.1_Flux1_SD3_MJ_Dalle_Human_Alignment_Dataset)**
# Rapidata 图像生成对齐数据集
<a href="https://www.rapidata.ai"><img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/jfxR79bOztqaC6_yNNnGU.jpeg" width="400" alt="数据集可视化"></a>
本数据集是总规模超200万的人类标注数据集的三分之一,该完整数据集被划分为三个模态:偏好性、一致性以及文本到图像对齐(Text-to-Image Alignment)任务数据集。
- 一致性数据集链接:https://huggingface.co/datasets/Rapidata/Flux_SD3_MJ_Dalle_Human_Coherence_Dataset
- 偏好性数据集链接:https://huggingface.co/datasets/Rapidata/700k_Human_Preference_Dataset_FLUX_SD3_MJ_DALLE3
本数据集通过Rapidata Python应用程序编程接口(API)(https://docs.rapidata.ai)耗时约2天完成采集。
若您从本数据集获益并希望未来获取更多同类资源,欢迎为其点赞。
## 概述
本数据集是目前规模最大的面向文本到图像模型的人类标注对齐数据集之一,本次发布的版本包含超70万条人类偏好投票,占我们完整的200万条投票集的三分之一。该偏好数据集是一项大型评估的组成部分,该评估对比了包括Flux.1、DALL·E 3、MidJourney以及稳定扩散(Stable Diffusion)在内的主流AI模型生成图像的表现。完整数据集还包含另外两个同等规模的数据集,分别聚焦于图像一致性与文本-图像对齐任务,均可在我们的个人主页获取。本大规模数据集仅用2天便完成采集,依托Rapidata突破性的标注技术,展现了大规模人类反馈采集前所未有的高效性。
您可在我们的[官方网站](https://www.rapidata.ai/benchmark)查看最新的模型排名。
## 核心特性
- **超大规模**:48小时内采集超70万条独立人类偏好投票
- **全球代表性**:来自145个国家的144292名参与者完成标注
- **多样化提示词**:精心筛选的282条提示词,覆盖图像生成的各类评测维度
- **主流模型对比**:四款当前最先进的图像生成模型之间的性能对比
- **严谨的实验方法**:采用成对比较范式,并内置质量控制机制
- **丰富的人口统计数据**:包含标注者的年龄、性别与地理位置信息
<img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/1LVQj_G5bFio7w4WXPxsC.png" alt="标注者分布概况" width="650">
**图:** 按大洲划分的标注者分布概况(左图)与全球人口分布(右图)对比
## 应用场景
本数据集可广泛应用于:
- 图像生成模型的训练与微调
- 探究AI生成图像的全球偏好特征
- 优化生成式模型的评测指标
- 跨文化审美偏好相关研究
- 新型图像生成模型的基准测试
## 基于Rapidata的数据采集
原本需要数周乃至数月的数据采集工作,依托Rapidata创新的标注平台仅用48小时便完成。我们的技术具备以下优势:
- 极速大规模数据采集能力
- 覆盖145个以上国家的全球触达范围
- 内置质量保障机制
- 全面的人口统计样本代表性
- 高性价比的大规模标注服务
## 引用说明
若您在研究中使用本数据集,请引用我们的初创团队Rapidata及相关论文:《Finding the Subjective Truth: Collecting 2 Million Votes for Comprehensive Gen-AI(生成式AI) Model Evaluation》(arXiv预印本平台(arXiv):2409.11904v2)
@misc{christodoulou2024findingsubjectivetruthcollecting,
title={Finding the Subjective Truth: Collecting 2 Million Votes for Comprehensive Gen-AI Model Evaluation},
author={Dimitrios Christodoulou and Mads Kuhlmann-Jørgensen},
year={2024},
eprint={2409.11904},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2409.11904},
}
## 关于Rapidata
Rapidata的技术让大规模人类反馈的采集变得更快、更易用。访问[rapidata.ai](https://www.rapidata.ai/)了解更多我们如何革新AI开发领域的人类反馈采集技术的信息。
我们依托自研的[应用程序编程接口(API)](https://docs.rapidata.ai/)构建了本数据集,您可通过该API快速获取即时的人类智能服务。
提供机构:
maas
创建时间:
2025-01-25



