five

data-is-better-together/image_preferences_results

收藏
Hugging Face2024-11-10 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/data-is-better-together/image_preferences_results
下载链接
链接失效反馈
官方服务:
资源简介:
--- size_categories: n<1K tags: - rlfh - argilla - human-feedback --- # Dataset Card for image_preferences_results This dataset has been created with [Argilla](https://github.com/argilla-io/argilla). As shown in the sections below, this dataset can be loaded into your Argilla server as explained in [Load with Argilla](#load-with-argilla), or used directly with the `datasets` library in [Load with `datasets`](#load-with-datasets). ## Using this dataset with Argilla To load with Argilla, you'll just need to install Argilla as `pip install argilla --upgrade` and then use the following code: ```python import argilla as rg ds = rg.Dataset.from_hub("DIBT/image_preferences_results") ``` This will load the settings and records from the dataset repository and push them to you Argilla server for exploration and annotation. ## Using this dataset with `datasets` To load the records of this dataset with `datasets`, you'll just need to install `datasets` as `pip install datasets --upgrade` and then use the following code: ```python from datasets import load_dataset ds = load_dataset("DIBT/image_preferences_results") ``` This will only load the records of the dataset, but not the Argilla settings. ## Dataset Structure This dataset repo contains: * Dataset records in a format compatible with HuggingFace `datasets`. These records will be loaded automatically when using `rg.Dataset.from_hub` and can be loaded independently using the `datasets` library via `load_dataset`. * The [annotation guidelines](#annotation-guidelines) that have been used for building and curating the dataset, if they've been defined in Argilla. * A dataset configuration folder conforming to the Argilla dataset format in `.argilla`. The dataset is created in Argilla with: **fields**, **questions**, **suggestions**, **metadata**, **vectors**, and **guidelines**. ### Fields The **fields** are the features or text of a dataset's records. For example, the 'text' column of a text classification dataset of the 'prompt' column of an instruction following dataset. | Field Name | Title | Type | Required | Markdown | | ---------- | ----- | ---- | -------- | -------- | | images | images | custom | True | | ### Questions The **questions** are the questions that will be asked to the annotators. They can be of different types, such as rating, text, label_selection, multi_label_selection, or ranking. | Question Name | Title | Type | Required | Description | Values/Labels | | ------------- | ----- | ---- | -------- | ----------- | ------------- | | preference | preference | label_selection | True | Which image do you prefer given the prompt? | ['image_1', 'image_2', 'both_good', 'both_bad'] | <!-- check length of metadata properties --> ### Data Instances An example of a dataset instance in Argilla looks as follows: ```json { "_server_id": "30403740-6a5e-48d7-839e-dcea7ad0dfda", "fields": { "images": { "image_1": "https://huggingface.co/datasets/DIBT/img_prefs_style/resolve/main/artifacts/image_generation_0/images/b172c7078a07c159f5f8da7bd1220ddd.jpeg", "image_2": "https://huggingface.co/datasets/DIBT/img_prefs_style/resolve/main/artifacts/image_generation_2/images/b172c7078a07c159f5f8da7bd1220ddd.jpeg", "prompt": "8-bit intellect, pixelated wisdom, retro digital brain, vintage game insight, soft neon glow, intricate pixel art, vibrant color palette, nostalgic ambiance" } }, "id": "f5224be1-2e1b-428e-94b1-9c0f397092fa", "metadata": { "category": "Animation", "evolution": "quality", "model_1": "schnell", "model_2": "dev", "sub_category": "Pixel Art" }, "responses": { "preference": [ { "user_id": "c53e62ab-d792-4854-98f6-593b2ffb55bc", "value": "image_2" }, { "user_id": "b1ab2cdd-29b8-4cf9-b6e0-7543589d21a3", "value": "image_2" }, { "user_id": "da3e5871-920c-44da-8c44-1e94260c581e", "value": "both_good" }, { "user_id": "b31dd1ed-78b6-4d50-8f11-7ce32ba17d64", "value": "image_2" }, { "user_id": "6b984f66-86b3-421e-a32c-cd3592ee27a1", "value": "both_bad" } ] }, "status": "completed", "suggestions": {}, "vectors": {} } ``` While the same record in HuggingFace `datasets` looks as follows: ```json { "_server_id": "30403740-6a5e-48d7-839e-dcea7ad0dfda", "category": "Animation", "evolution": "quality", "id": "f5224be1-2e1b-428e-94b1-9c0f397092fa", "images": { "image_1": "https://huggingface.co/datasets/DIBT/img_prefs_style/resolve/main/artifacts/image_generation_0/images/b172c7078a07c159f5f8da7bd1220ddd.jpeg", "image_2": "https://huggingface.co/datasets/DIBT/img_prefs_style/resolve/main/artifacts/image_generation_2/images/b172c7078a07c159f5f8da7bd1220ddd.jpeg", "prompt": "8-bit intellect, pixelated wisdom, retro digital brain, vintage game insight, soft neon glow, intricate pixel art, vibrant color palette, nostalgic ambiance" }, "model_1": "schnell", "model_2": "dev", "preference.responses": [ "image_2", "image_2", "both_good", "image_2", "both_bad" ], "preference.responses.status": [ "submitted", "submitted", "submitted", "submitted", "submitted" ], "preference.responses.users": [ "c53e62ab-d792-4854-98f6-593b2ffb55bc", "b1ab2cdd-29b8-4cf9-b6e0-7543589d21a3", "da3e5871-920c-44da-8c44-1e94260c581e", "b31dd1ed-78b6-4d50-8f11-7ce32ba17d64", "6b984f66-86b3-421e-a32c-cd3592ee27a1" ], "status": "completed", "sub_category": "Pixel Art" } ``` ### Data Splits The dataset contains a single split, which is `train`. ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation guidelines [More Information Needed] #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information [More Information Needed] ### Contributions [More Information Needed]

size_categories: 样本数少于1000 tags: - 人类反馈强化学习(Reinforcement Learning from Human Feedback, RLHF) - Argilla - 人类反馈 # image_preferences_results 数据集卡片 本数据集基于[Argilla](https://github.com/argilla-io/argilla)构建。如下文所述,该数据集既可按照[通过Argilla加载](#load-with-argilla)的步骤导入至您的Argilla服务器,也可直接通过`datasets`库按照[通过datasets加载](#load-with-datasets)的方式使用。 ## 在Argilla中使用本数据集 若要通过Argilla加载本数据集,仅需执行`pip install argilla --upgrade`命令安装Argilla,随后运行以下代码: python import argilla as rg ds = rg.Dataset.from_hub("DIBT/image_preferences_results") 该操作将从数据集仓库中加载配置与数据集记录,并将其推送至您的Argilla服务器,以供探索与标注使用。 ## 通过`datasets`库使用本数据集 若要通过`datasets`库加载本数据集的记录,仅需执行`pip install datasets --upgrade`命令安装`datasets`库,随后运行以下代码: python from datasets import load_dataset ds = load_dataset("DIBT/image_preferences_results") 该操作仅会加载数据集的记录,而非Argilla的配置信息。 ## 数据集结构 本数据集仓库包含以下内容: * 兼容HuggingFace `datasets`格式的数据集记录。使用`rg.Dataset.from_hub`时将自动加载此类记录,也可通过`datasets`库的`load_dataset`函数独立加载。 * 用于构建与整理数据集的[标注指南](#annotation-guidelines)(若已在Argilla中定义)。 * 符合Argilla数据集格式的`.argilla`数据集配置文件夹。 本数据集在Argilla中通过以下元素构建:**字段(fields)**、**问题(questions)**、**建议(suggestions)**、**元数据(metadata)**、**向量(vectors)**以及**指南(guidelines)**。 ### 字段(Fields) **字段(fields)**指数据集记录的特征或文本内容。例如,文本分类数据集中的`text`列,或是指令遵循数据集中的`prompt`列。 | 字段名称 | 标题 | 类型 | 是否必填 | Markdown支持 | | ------- | ---- | ---- | ------- | ----------- | | images | 图像 | 自定义 | 是 | 无 | ### 问题(Questions) **问题(questions)**指向标注者提出的查询内容,支持多种类型,包括评分、文本、标签选择、多标签选择以及排序等。 | 问题名称 | 标题 | 类型 | 是否必填 | 描述 | 可选值/标签 | | ------- | ---- | ---- | ------- | ---- | ---------- | | preference | 偏好选择 | 标签选择 | 是 | 结合给定提示词,您更偏好哪张图像? | ['image_1', 'image_2', 'both_good', 'both_bad'] | ### 数据实例 Argilla中的数据集实例示例如下: json { "_server_id": "30403740-6a5e-48d7-839e-dcea7ad0dfda", "fields": { "images": { "image_1": "https://huggingface.co/datasets/DIBT/img_prefs_style/resolve/main/artifacts/image_generation_0/images/b172c7078a07c159f5f8da7bd1220ddd.jpeg", "image_2": "https://huggingface.co/datasets/DIBT/img_prefs_style/resolve/main/artifacts/image_generation_2/images/b172c7078a07c159f5f8da7bd1220ddd.jpeg", "prompt": "8-bit intellect, pixelated wisdom, retro digital brain, vintage game insight, soft neon glow, intricate pixel art, vibrant color palette, nostalgic ambiance" } }, "id": "f5224be1-2e1b-428e-94b1-9c0f397092fa", "metadata": { "category": "Animation", "evolution": "quality", "model_1": "schnell", "model_2": "dev", "sub_category": "Pixel Art" }, "responses": { "preference": [ { "user_id": "c53e62ab-d792-4854-98f6-593b2ffb55bc", "value": "image_2" }, { "user_id": "b1ab2cdd-29b8-4cf9-b6e0-7543589d21a3", "value": "image_2" }, { "user_id": "da3e5871-920c-44da-8c44-1e94260c581e", "value": "both_good" }, { "user_id": "b31dd1ed-78b6-4d50-8f11-7ce32ba17d64", "value": "image_2" }, { "user_id": "6b984f66-86b3-421e-a32c-cd3592ee27a1", "value": "both_bad" } ] }, "status": "completed", "suggestions": {}, "vectors": {} } 而该记录在HuggingFace `datasets`中的格式示例如下: json { "_server_id": "30403740-6a5e-48d7-839e-dcea7ad0dfda", "category": "Animation", "evolution": "quality", "id": "f5224be1-2e1b-428e-94b1-9c0f397092fa", "images": { "image_1": "https://huggingface.co/datasets/DIBT/img_prefs_style/resolve/main/artifacts/image_generation_0/images/b172c7078a07c159f5f8da7bd1220ddd.jpeg", "image_2": "https://huggingface.co/datasets/DIBT/img_prefs_style/resolve/main/artifacts/image_generation_2/images/b172c7078a07c159f5f8da7bd1220ddd.jpeg", "prompt": "8-bit intellect, pixelated wisdom, retro digital brain, vintage game insight, soft neon glow, intricate pixel art, vibrant color palette, nostalgic ambiance" }, "model_1": "schnell", "model_2": "dev", "preference.responses": [ "image_2", "image_2", "both_good", "image_2", "both_bad" ], "preference.responses.status": [ "submitted", "submitted", "submitted", "submitted", "submitted" ], "preference.responses.users": [ "c53e62ab-d792-4854-98f6-593b2ffb55bc", "b1ab2cdd-29b8-4cf9-b6e0-7543589d21a3", "da3e5871-920c-44da-8c44-1e94260c581e", "b31dd1ed-78b6-4d50-8f11-7ce32ba17d64", "6b984f66-86b3-421e-a32c-cd3592ee27a1" ], "status": "completed", "sub_category": "Pixel Art" } ### 数据划分 本数据集仅包含一个划分,即`train`(训练集)。 ## 数据集构建 ### 整理依据 [需补充更多信息] ### 源数据 #### 初始数据收集与标准化 [需补充更多信息] #### 源语言生成者是谁? [需补充更多信息] ### 标注 #### 标注指南 [需补充更多信息] #### 标注流程 [需补充更多信息] #### 标注者是谁? [需补充更多信息] ### 个人与敏感信息 [需补充更多信息] ## 数据集使用注意事项 ### 数据集的社会影响 [需补充更多信息] ### 偏差讨论 [需补充更多信息] ### 其他已知限制 [需补充更多信息] ## 补充信息 ### 数据集整理者 [需补充更多信息] ### 许可信息 [需补充更多信息] ### 引用信息 [需补充更多信息] ### 贡献 [需补充更多信息]
提供机构:
data-is-better-together
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作