700k_Human_Preference_Dataset_FLUX_SD3_MJ_DALLE3

Name: 700k_Human_Preference_Dataset_FLUX_SD3_MJ_DALLE3
Creator: maas
Published: 2025-12-04 09:34:48
License: 暂无描述

魔搭社区2025-12-04 更新2025-02-01 收录

下载链接：

https://modelscope.cn/datasets/Rapidata/700k_Human_Preference_Dataset_FLUX_SD3_MJ_DALLE3

下载链接

链接失效反馈

官方服务：

资源简介：

## **NOTE:** A newer version of this dataset is available [Imagen3_Flux1.1_Flux1_SD3_MJ_Dalle_Human_Preference_Dataset](https://huggingface.co/datasets/Rapidata/Imagen3_Flux1.1_Flux1_SD3_MJ_Dalle_Human_Preference_Dataset) # Rapidata Image Generation Preference Dataset <a href="https://www.rapidata.ai"> <img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/jfxR79bOztqaC6_yNNnGU.jpeg" width="400" alt="Dataset visualization"> </a> This Dataset is a 1/3 of a 2M+ human annotation dataset that was split into three modalities: Preference, Coherence, Text-to-Image Alignment. - Link to the Coherence dataset: https://huggingface.co/datasets/Rapidata/Flux_SD3_MJ_Dalle_Human_Coherence_Dataset - Link to the Text-2-Image Alignment dataset: https://huggingface.co/datasets/Rapidata/Flux_SD3_MJ_Dalle_Human_Alignment_Dataset It was collected in ~2 Days using the Rapidata Python API https://docs.rapidata.ai If you get value from this dataset and would like to see more in the future, please consider liking it. ## Overview One of the largest human preference datasets for text-to-image models, this release contains over 700,000 human preference votes - one third of our complete 2 million vote collection. This preference dataset is part of a larger evaluation comparing images from leading AI models including Flux.1, DALL-E 3, MidJourney, and Stable Diffusion. The complete collection includes two additional datasets of equal size focusing on image coherence and text-image alignment, available on our profile. This extensive dataset was collected in just 2 days using Rapidata's groundbreaking annotation technology, demonstrating unprecedented efficiency in large-scale human feedback collection. Explore our latest model rankings on our [website](https://www.rapidata.ai/benchmark). ## Key Features - **Massive Scale**: 700,000+ individual human preference votes collected in 48 hours - **Global Representation**: Collected from 144,292 participants across 145 countries - **Diverse Prompts**: 282 carefully curated prompts testing various aspects of image generation - **Leading Models**: Comparisons between four state-of-the-art image generation models - **Rigorous Methodology**: Uses pairwise comparisons with built-in quality controls - **Rich Demographic Data**: Includes annotator information about age, gender, and geographic location <img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/1LVQj_G5bFio7w4WXPxsC.png" alt="Image description" width="650"> **Figure:** Overview of the distribution of annotators by continent (left) compared to the world population distribution(right) ## Applications This dataset is invaluable for: - Training and fine-tuning image generation models - Understanding global preferences in AI-generated imagery - Developing better evaluation metrics for generative models - Researching cross-cultural aesthetic preferences - Benchmarking new image generation models ## Data Collection Powered by Rapidata What traditionally would take weeks or months of data collection was accomplished in just 48 hours through Rapidata's innovative annotation platform. Our technology enables: - Lightning-fast data collection at massive scale - Global reach across 145+ countries - Built-in quality assurance mechanisms - Comprehensive demographic representation - Cost-effective large-scale annotation ## Citation If you use this dataset in your research, please cite our Startup Rapidata and our paper: "Finding the Subjective Truth: Collecting 2 Million Votes for Comprehensive Gen-AI Model Evaluation" (arXiv:2409.11904v2) ``` @misc{christodoulou2024findingsubjectivetruthcollecting, title={Finding the Subjective Truth: Collecting 2 Million Votes for Comprehensive Gen-AI Model Evaluation}, author={Dimitrios Christodoulou and Mads Kuhlmann-Jørgensen}, year={2024}, eprint={2409.11904}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2409.11904}, } ``` ## About Rapidata Rapidata's technology makes collecting human feedback at scale faster and more accessible than ever before. Visit [rapidata.ai](https://www.rapidata.ai/) to learn more about how we're revolutionizing human feedback collection for AI development.

**注意：本数据集的更新版本已发布，详见 [Imagen3_Flux1.1_Flux1_SD3_MJ_Dalle_Human_Preference_Dataset](https://huggingface.co/datasets/Rapidata/Imagen3_Flux1.1_Flux1_SD3_MJ_Dalle_Human_Preference_Dataset)** # Rapidata 图像生成偏好数据集 <a href="https://www.rapidata.ai"> <img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/jfxR79bOztqaC6_yNNnGU.jpeg" width="400" alt="数据集可视化"> </a> 本数据集为包含200万+人类标注的完整数据集的三分之一，该完整数据集被划分为三个模态：偏好性（Preference）、连贯性（Coherence）、文本-图像对齐（Text-to-Image Alignment）。 - 连贯性数据集链接：https://huggingface.co/datasets/Rapidata/Flux_SD3_MJ_Dalle_Human_Coherence_Dataset - 文本-图像对齐数据集链接：https://huggingface.co/datasets/Rapidata/Flux_SD3_MJ_Dalle_Human_Alignment_Dataset 本数据集通过Rapidata Python API（https://docs.rapidata.ai）耗时约2天完成采集。若本数据集对你有所助益，并希望未来看到更多同类资源，欢迎为其点赞。 ## 概述作为面向文本到图像模型的超大规模人类偏好数据集之一，本次发布的数据集包含超过70万条人类偏好投票，占我们完整200万条投票集合的三分之一。本偏好数据集是一项大型评估的组成部分，该评估对比了包括Flux.1、DALL-E 3、MidJourney及Stable Diffusion在内的主流AI模型生成的图像。完整数据集集合还包含另外两个同等规模的数据集，分别聚焦于图像连贯性与文本-图像对齐（Text-to-Image Alignment），均可在我们的个人主页获取。本大规模数据集借助Rapidata的突破性标注技术仅用2天便完成采集，展现了大规模人类反馈采集领域前所未有的效率。可前往我们的[官网](https://www.rapidata.ai/benchmark)查看最新的模型排名。 ## 核心特性 - **超大规模**：48小时内采集超过70万条独立人类偏好投票 - **全球覆盖**：采集自145个国家的144292名参与者 - **多样化提示词**：282条精心筛选的提示词，覆盖图像生成的多个维度 - **主流模型对比**：包含四款当前最先进的图像生成模型的对比 - **严谨的实验方法**：采用成对比较（pairwise comparisons）范式，并内置质量控制机制 - **丰富的人口统计数据**：包含标注者的年龄、性别及地理位置信息 <img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/1LVQj_G5bFio7w4WXPxsC.png" alt="图像说明" width="650"> **图注**：左侧为按大洲划分的标注者分布概况，右侧为全球人口分布对比 ## 应用场景本数据集可广泛应用于： - 图像生成模型的训练与微调 - 探究AI生成图像的全球审美偏好 - 为生成式模型（Generative Models）开发更完善的评估指标 - 研究跨文化审美偏好 - 为新型图像生成模型提供基准测试 ## 基于Rapidata的数据采集借助Rapidata的创新标注平台，原本需要数周乃至数月的数据采集工作仅用48小时便得以完成。我们的技术具备以下优势： - 支持超大规模的极速数据采集 - 覆盖145个以上国家的全球受众 - 内置质量保障机制 - 实现全面的人口统计样本代表性 - 具备大规模标注的成本效益 ## 引用方式若您在研究中使用本数据集，请引用我们的初创团队Rapidata及相关论文：《探寻主观真相：收集200万条投票以实现全面的生成式AI模型评估》（arXiv:2409.11904v2） @misc{christodoulou2024findingsubjectivetruthcollecting, title={Finding the Subjective Truth: Collecting 2 Million Votes for Comprehensive Gen-AI Model Evaluation}, author={Dimitrios Christodoulou and Mads Kuhlmann-Jørgensen}, year={2024}, eprint={2409.11904}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2409.11904}, } ## 关于Rapidata Rapidata的技术让大规模人类反馈的采集比以往任何时候都更快速、更便捷。欢迎访问[rapidata.ai](https://www.rapidata.ai/)，了解我们如何革新AI开发领域的人类反馈采集工作。

提供机构：

maas

创建时间：

2025-01-25

5,000+

优质数据集

54 个

任务类型

进入经典数据集