five

700k_Human_Preference_Dataset_FLUX_SD3_MJ_DALLE3

收藏
魔搭社区2025-12-04 更新2025-02-01 收录
下载链接:
https://modelscope.cn/datasets/Rapidata/700k_Human_Preference_Dataset_FLUX_SD3_MJ_DALLE3
下载链接
链接失效反馈
官方服务:
资源简介:
## **NOTE:** A newer version of this dataset is available [Imagen3_Flux1.1_Flux1_SD3_MJ_Dalle_Human_Preference_Dataset](https://huggingface.co/datasets/Rapidata/Imagen3_Flux1.1_Flux1_SD3_MJ_Dalle_Human_Preference_Dataset) # Rapidata Image Generation Preference Dataset <a href="https://www.rapidata.ai"> <img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/jfxR79bOztqaC6_yNNnGU.jpeg" width="400" alt="Dataset visualization"> </a> This Dataset is a 1/3 of a 2M+ human annotation dataset that was split into three modalities: Preference, Coherence, Text-to-Image Alignment. - Link to the Coherence dataset: https://huggingface.co/datasets/Rapidata/Flux_SD3_MJ_Dalle_Human_Coherence_Dataset - Link to the Text-2-Image Alignment dataset: https://huggingface.co/datasets/Rapidata/Flux_SD3_MJ_Dalle_Human_Alignment_Dataset It was collected in ~2 Days using the Rapidata Python API https://docs.rapidata.ai If you get value from this dataset and would like to see more in the future, please consider liking it. ## Overview One of the largest human preference datasets for text-to-image models, this release contains over 700,000 human preference votes - one third of our complete 2 million vote collection. This preference dataset is part of a larger evaluation comparing images from leading AI models including Flux.1, DALL-E 3, MidJourney, and Stable Diffusion. The complete collection includes two additional datasets of equal size focusing on image coherence and text-image alignment, available on our profile. This extensive dataset was collected in just 2 days using Rapidata's groundbreaking annotation technology, demonstrating unprecedented efficiency in large-scale human feedback collection. Explore our latest model rankings on our [website](https://www.rapidata.ai/benchmark). ## Key Features - **Massive Scale**: 700,000+ individual human preference votes collected in 48 hours - **Global Representation**: Collected from 144,292 participants across 145 countries - **Diverse Prompts**: 282 carefully curated prompts testing various aspects of image generation - **Leading Models**: Comparisons between four state-of-the-art image generation models - **Rigorous Methodology**: Uses pairwise comparisons with built-in quality controls - **Rich Demographic Data**: Includes annotator information about age, gender, and geographic location <img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/1LVQj_G5bFio7w4WXPxsC.png" alt="Image description" width="650"> **Figure:** Overview of the distribution of annotators by continent (left) compared to the world population distribution(right) ## Applications This dataset is invaluable for: - Training and fine-tuning image generation models - Understanding global preferences in AI-generated imagery - Developing better evaluation metrics for generative models - Researching cross-cultural aesthetic preferences - Benchmarking new image generation models ## Data Collection Powered by Rapidata What traditionally would take weeks or months of data collection was accomplished in just 48 hours through Rapidata's innovative annotation platform. Our technology enables: - Lightning-fast data collection at massive scale - Global reach across 145+ countries - Built-in quality assurance mechanisms - Comprehensive demographic representation - Cost-effective large-scale annotation ## Citation If you use this dataset in your research, please cite our Startup Rapidata and our paper: "Finding the Subjective Truth: Collecting 2 Million Votes for Comprehensive Gen-AI Model Evaluation" (arXiv:2409.11904v2) ``` @misc{christodoulou2024findingsubjectivetruthcollecting, title={Finding the Subjective Truth: Collecting 2 Million Votes for Comprehensive Gen-AI Model Evaluation}, author={Dimitrios Christodoulou and Mads Kuhlmann-Jørgensen}, year={2024}, eprint={2409.11904}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2409.11904}, } ``` ## About Rapidata Rapidata's technology makes collecting human feedback at scale faster and more accessible than ever before. Visit [rapidata.ai](https://www.rapidata.ai/) to learn more about how we're revolutionizing human feedback collection for AI development.

**注意:本数据集的更新版本已发布,详见 [Imagen3_Flux1.1_Flux1_SD3_MJ_Dalle_Human_Preference_Dataset](https://huggingface.co/datasets/Rapidata/Imagen3_Flux1.1_Flux1_SD3_MJ_Dalle_Human_Preference_Dataset)** # Rapidata 图像生成偏好数据集 <a href="https://www.rapidata.ai"> <img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/jfxR79bOztqaC6_yNNnGU.jpeg" width="400" alt="数据集可视化"> </a> 本数据集为包含200万+人类标注的完整数据集的三分之一,该完整数据集被划分为三个模态:偏好性(Preference)、连贯性(Coherence)、文本-图像对齐(Text-to-Image Alignment)。 - 连贯性数据集链接:https://huggingface.co/datasets/Rapidata/Flux_SD3_MJ_Dalle_Human_Coherence_Dataset - 文本-图像对齐数据集链接:https://huggingface.co/datasets/Rapidata/Flux_SD3_MJ_Dalle_Human_Alignment_Dataset 本数据集通过Rapidata Python API(https://docs.rapidata.ai)耗时约2天完成采集。 若本数据集对你有所助益,并希望未来看到更多同类资源,欢迎为其点赞。 ## 概述 作为面向文本到图像模型的超大规模人类偏好数据集之一,本次发布的数据集包含超过70万条人类偏好投票,占我们完整200万条投票集合的三分之一。本偏好数据集是一项大型评估的组成部分,该评估对比了包括Flux.1、DALL-E 3、MidJourney及Stable Diffusion在内的主流AI模型生成的图像。完整数据集集合还包含另外两个同等规模的数据集,分别聚焦于图像连贯性与文本-图像对齐(Text-to-Image Alignment),均可在我们的个人主页获取。本大规模数据集借助Rapidata的突破性标注技术仅用2天便完成采集,展现了大规模人类反馈采集领域前所未有的效率。 可前往我们的[官网](https://www.rapidata.ai/benchmark)查看最新的模型排名。 ## 核心特性 - **超大规模**:48小时内采集超过70万条独立人类偏好投票 - **全球覆盖**:采集自145个国家的144292名参与者 - **多样化提示词**:282条精心筛选的提示词,覆盖图像生成的多个维度 - **主流模型对比**:包含四款当前最先进的图像生成模型的对比 - **严谨的实验方法**:采用成对比较(pairwise comparisons)范式,并内置质量控制机制 - **丰富的人口统计数据**:包含标注者的年龄、性别及地理位置信息 <img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/1LVQj_G5bFio7w4WXPxsC.png" alt="图像说明" width="650"> **图注**:左侧为按大洲划分的标注者分布概况,右侧为全球人口分布对比 ## 应用场景 本数据集可广泛应用于: - 图像生成模型的训练与微调 - 探究AI生成图像的全球审美偏好 - 为生成式模型(Generative Models)开发更完善的评估指标 - 研究跨文化审美偏好 - 为新型图像生成模型提供基准测试 ## 基于Rapidata的数据采集 借助Rapidata的创新标注平台,原本需要数周乃至数月的数据采集工作仅用48小时便得以完成。我们的技术具备以下优势: - 支持超大规模的极速数据采集 - 覆盖145个以上国家的全球受众 - 内置质量保障机制 - 实现全面的人口统计样本代表性 - 具备大规模标注的成本效益 ## 引用方式 若您在研究中使用本数据集,请引用我们的初创团队Rapidata及相关论文:《探寻主观真相:收集200万条投票以实现全面的生成式AI模型评估》(arXiv:2409.11904v2) @misc{christodoulou2024findingsubjectivetruthcollecting, title={Finding the Subjective Truth: Collecting 2 Million Votes for Comprehensive Gen-AI Model Evaluation}, author={Dimitrios Christodoulou and Mads Kuhlmann-Jørgensen}, year={2024}, eprint={2409.11904}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2409.11904}, } ## 关于Rapidata Rapidata的技术让大规模人类反馈的采集比以往任何时候都更快速、更便捷。欢迎访问[rapidata.ai](https://www.rapidata.ai/),了解我们如何革新AI开发领域的人类反馈采集工作。
提供机构:
maas
创建时间:
2025-01-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作