human-coherence-preferences-images
收藏魔搭社区2025-12-05 更新2025-02-01 收录
下载链接:
https://modelscope.cn/datasets/Rapidata/human-coherence-preferences-images
下载链接
链接失效反馈官方服务:
资源简介:
# Rapidata Image Generation Coherence Dataset
<a href="https://www.rapidata.ai">
<img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/jfxR79bOztqaC6_yNNnGU.jpeg" width="400" alt="Dataset visualization">
</a>
This dataset was collected in ~4 Days using the [Rapidata Python API](https://docs.rapidata.ai), accessible to anyone and ideal for large scale data annotation.
Explore our latest model rankings on our [website](https://www.rapidata.ai/benchmark).
If you get value from this dataset and would like to see more in the future, please consider liking it.
## Overview
One of the largest human annotated coherence datasets for text-to-image models, this release contains over 1,200,000 human coherence votes. This coherence dataset builds on the already published [Coherence Dataset](https://huggingface.co/datasets/Rapidata/Flux_SD3_MJ_Dalle_Human_Coherence_Dataset) and shows Rapidata's ability to consistently rank new image generation models at unprecedented speeds.
Participants were shown two images and asked, "Which image feels less weird or unnatural when you look closely? I.e., has fewer strange-looking visual errors or glitches?"
## Key Features
- **Massive Scale**: 1,200,000+ individual human coherence votes collected in under 100 hours
- **Global Representation**: Collected from participants across the globe
- **Diverse Prompts**: Carefully curated prompts testing various aspects of image generation
- **Leading Models**: Comparisons between state-of-the-art image generation models
<img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/1LVQj_G5bFio7w4WXPxsC.png" alt="Image description" width="650">
**Figure:** Overview of the distribution of annotators by continent (left) compared to the world population distribution (right)
## Applications
This dataset is invaluable for:
- Benchmarking new image generation models
- Developing better evaluation metrics for generative models
- Understanding global preferences in AI-generated imagery
- Training and fine-tuning image generation models
- Researching cross-cultural aesthetic preferences
## Data Collection Powered by Rapidata
What traditionally would take weeks or months of data collection was accomplished in under 100 hours through Rapidata's innovative annotation platform. Our technology enables:
- Lightning-fast data collection at massive scale
- Global reach across 145+ countries
- Built-in quality assurance mechanisms
- Comprehensive demographic representation
- Cost-effective large-scale annotation
## About Rapidata
Rapidata's technology makes collecting human feedback at scale faster and more accessible than ever before. Visit [rapidata.ai](https://www.rapidata.ai/) to learn more about how we're revolutionizing human feedback collection for AI development.
# Rapidata图像生成一致性数据集
<a href="https://www.rapidata.ai">
<img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/jfxR79bOztqaC6_yNNnGU.jpeg" width="400" alt="数据集可视化">
</a>
本数据集依托[Rapidata Python应用程序编程接口(Rapidata Python API)](https://docs.rapidata.ai)耗时约4天完成采集,面向所有用户开放,是大规模数据标注的理想选择。
可前往我们的[官网](https://www.rapidata.ai/benchmark)查看最新的模型排名榜单。
若本数据集为您带来研究价值并希望未来推出更多同类资源,恳请您为该数据集点赞。
## 数据集概览
本数据集是目前规模最大的文本到图像模型人工标注一致性数据集之一,共包含超过120万条人工一致性投票。本一致性数据集基于已公开的[一致性数据集(Coherence Dataset)](https://huggingface.co/datasets/Rapidata/Flux_SD3_MJ_Dalle_Human_Coherence_Dataset)构建,展现了Rapidata以空前速度持续对新型图像生成模型进行排名的能力。
标注参与者将看到两张图像,并需回答以下问题:「仔细观察后,哪张图像的怪异感或不自然感更低?换言之,其视觉错误或瑕疵更少?」
## 核心特性
- **超大规模**:在100小时内完成了120万条以上的单条人工一致性投票采集
- **全球参与**:采集样本来自全球各地的参与者
- **提示词多样**:经过精心筛选的提示词覆盖了图像生成的各类评测维度
- **前沿模型覆盖**:包含当前顶尖图像生成模型之间的对比评测
<img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/1LVQj_G5bFio7w4WXPxsC.png" alt="图像分布说明" width="650">
**图:** 按大洲划分的标注者分布情况(左图)与全球人口分布情况(右图)对比
## 应用场景
本数据集可广泛应用于以下方向:
- 新型图像生成模型的性能基准评测
- 生成模型评测指标的优化与研发
- 探索人工智能生成图像的全球审美偏好
- 图像生成模型的训练与微调
- 跨文化审美偏好研究
## 基于Rapidata的数据采集流程
传统数据采集往往需要数周甚至数月时间,而依托Rapidata的创新标注平台,本数据集仅用不到100小时便完成了全部采集工作。我们的技术可实现以下优势:
- 超大规模数据的极速采集
- 覆盖145个以上国家的全球触达能力
- 内置的质量保障机制
- 全面的人口统计学代表性
- 低成本的大规模标注服务
## 关于Rapidata
Rapidata的技术让大规模人工反馈采集的速度与可及性达到了前所未有的水平。请访问[rapidata.ai](https://www.rapidata.ai/),了解我们如何革新人工智能研发中的人工反馈采集流程。
提供机构:
maas
创建时间:
2025-01-25



