open-image-preferences-v1-results
收藏Dataset Card for image-preferences-results
概述
- 数据集名称: image-preferences-results
- 大小类别: 10K<n<100K
- 标签: rlfh, argilla, human-feedback
- 许可证: apache-2.0
目标
该项目旨在创建10K文本到图像偏好对。这些对可以用于评估图像生成模型在各种常见图像类别上的性能,基于不同难度的提示。
数据集结构
字段 (Fields)
| 字段名称 | 标题 | 类型 | 必需 | Markdown |
|---|---|---|---|---|
| images | Images | custom | True |
问题 (Questions)
| 问题名称 | 标题 | 类型 | 必需 | 描述 | 值/标签 |
|---|---|---|---|---|---|
| preference | Which image is better according to prompt adherence and aesthetics? | label_selection | True | Take a look at the guidelines (bottom left corner) to get more familiar with the project examples and our community. | [image_1, image_2, both_good, both_bad, toxic_content] |
元数据 (Metadata)
| 元数据名称 | 标题 | 类型 | 值 | 对标注者可见 |
|---|---|---|---|---|
| model_1 | model_1 | - | True | |
| model_2 | model_2 | - | True | |
| evolution | evolution | - | True |
向量 (Vectors)
| 向量名称 | 标题 | 维度 |
|---|---|---|
| prompt | prompt | [1, 256] |
数据实例
Argilla 中的数据实例
json { "_server_id": "c2306976-5e44-4ad4-b2ce-8a510ec6086b", "fields": { "images": { "image_1": "https://huggingface.co/datasets/data-is-better-together/image-preferences-filtered/resolve/main/image_quality_dev/3368.jpg", "image_2": "https://huggingface.co/datasets/data-is-better-together/image-preferences-filtered/resolve/main/image_quality_sd/3368.jpg", "prompt": "a bustling manga street, devoid of vehicles, detailed with vibrant colors and dynamic line work, characters in the background adding life and movement, under a soft golden hour light, with rich textures and a lively atmosphere, high resolution, sharp focus" } }, "id": "3368-quality", "metadata": { "category": "Manga", "evolution": "quality", "model_1": "dev", "model_2": "sd", "sub_category": "detailed" }, "responses": { "preference": [ { "user_id": "50b9a890-173b-4999-bffa-fc0524ba6c63", "value": "both_good" }, { "user_id": "caf19767-2989-4b3c-a653-9c30afc6361d", "value": "image_1" }, { "user_id": "ae3e20b2-9aeb-4165-af54-69eac3f2448b", "value": "image_1" } ] }, "status": "completed", "suggestions": {}, "vectors": {} }
HuggingFace datasets 中的数据实例
json { "_server_id": "c2306976-5e44-4ad4-b2ce-8a510ec6086b", "category": "Manga", "evolution": "quality", "id": "3368-quality", "images": { "image_1": "https://huggingface.co/datasets/data-is-better-together/image-preferences-filtered/resolve/main/image_quality_dev/3368.jpg", "image_2": "https://huggingface.co/datasets/data-is-better-together/image-preferences-filtered/resolve/main/image_quality_sd/3368.jpg", "prompt": "a bustling manga street, devoid of vehicles, detailed with vibrant colors and dynamic line work, characters in the background adding life and movement, under a soft golden hour light, with rich textures and a lively atmosphere, high resolution, sharp focus" }, "model_1": "dev", "model_2": "sd", "preference.responses": [ "both_good", "image_1", "image_1" ], "preference.responses.status": [ "submitted", "submitted", "submitted" ], "preference.responses.users": [ "50b9a890-173b-4999-bffa-fc0524ba6c63", "caf19767-2989-4b3c-a653-9c30afc6361d", "ae3e20b2-9aeb-4165-af54-69eac3f2448b" ], "prompt": null, "status": "completed", "sub_category": "detailed" }
数据分割
数据集包含一个分割,即 train。
数据集创建
标注指南
图像偏好任务
目标是收集关于图像的偏好。我们想知道哪些图像在关系中是最好的。这样我们就可以训练一个AI模型来生成像最好的图像。
最佳图像的定义
最佳图像应包含提示的所有属性,并且在提示的关系中具有美学上的吸引力。
使用数据集
使用 Argilla
python import argilla as rg
ds = rg.Dataset.from_hub("data-is-better-together/image-preferences-results", settings="auto")
使用 datasets
python from datasets import load_dataset
ds = load_dataset("data-is-better-together/image-preferences-results")




