AILab-CVC/SEED-Data-Edit-Part1-Openimages

Name: AILab-CVC/SEED-Data-Edit-Part1-Openimages
Creator: AILab-CVC
Published: 2024-05-05 04:24:32
License: 暂无描述

Hugging Face2024-05-05 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/AILab-CVC/SEED-Data-Edit-Part1-Openimages

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-4.0 task_categories: - text-to-image language: - en size_categories: - 1M<n<10M --- ## SEED-Data-Edit ![image](https://github.com/AILab-CVC/SEED-X/blob/main/demos/SEED-Data-Edit.jpg?raw=true) SEED-Data-Edit is a hybrid dataset for **instruction-guided image editing** with a total of 3.7 image editing pairs, which comprises three distinct types of data: **Part-1**: Large-scale high-quality editing data produced by automated pipelines (3.5M editing pairs). **Part-2**: Real-world scenario data collected from the internet (52K editing pairs). **Part-3**: High-precision multi-turn editing data annotated by humans (95K editing pairs, 21K multi-turn rounds with a maximum of 5 rounds). This repo contains Part-1 of SEED-Data-Edit, with source images come from [Openimages](https://arxiv.org/pdf/1811.00982). After downloading the data, you first need to reassemble the split files back into the original .tar.gz file as below, and then unzip the files. ```bash cat source_images.tar.gz.part-* > source_images.tar.gz cat target_images.tar.gz.part-* > target_images.tar.gz ``` The folder "annotations" contains the original instructions, while this folder "annotations_GPT4V" stores a small portion of instructions that have been rewritten by GPT-4V. ## SEED-X-Edit You can download the image editing model SEED-X-Edit in [Model](https://huggingface.co/AILab-CVC/SEED-X-17B/tree/main/seed_x_edit), which is instruction tuned from the pre-trained [SEED-X](https://arxiv.org/abs/2404.14396) with SEED-Data-Edit. For inference with SEED-X-Edit, you can refer to [SEED-X](https://github.com/AILab-CVC/SEED-X/tree/main). ![image](https://github.com/AILab-CVC/SEED-X/blob/main/demos/edit_comparison.jpg?raw=true) ## License SEED-Data-Edit is released under the license CC-BY-NC-4.0 for non-commercial research purpose only. Any use of the dataset for commercial purposes is strictly prohibited. For Part-1, we use images from [Unsplash](https://github.com/unsplash/datasets) and [Openimages](https://arxiv.org/pdf/1811.00982). For Part-2, we collect images from [Photoshopbattles](https://www.reddit.com/r/photoshopbattles/), [Photoshop gurus](https://www.photoshopgurus.com/forum/), [Photoshoprequest](https://www.reddit.com/r/PhotoshopRequest/), and [Zhopped](http://zhopped.com/). For Part-3, we use images from [Unsplash](https://github.com/unsplash/datasets), [SAM](https://arxiv.org/abs/2304.02643), and [JourneyDB](https://arxiv.org/abs/2307.00716). Tencent does not hold the copyright for these images and the copyright belongs to the original owner. If any image in SEED-Data-Edit infringes upon your rights, please contact us immediately and we will promptly remove the corresponding data.

提供机构：

AILab-CVC

原始信息汇总

数据集概述

数据集名称

SEED-Data-Edit

数据集类型

混合数据集，用于指令引导的图像编辑。

数据集组成

Part-1: 大规模高质量编辑数据，由自动化管道生成（3.5M编辑对）。
Part-2: 真实世界场景数据，从互联网收集（52K编辑对）。
Part-3: 高精度多轮编辑数据，由人工标注（95K编辑对，21K多轮回合，最多5轮）。

数据集大小

1M<n<10M

语言

英语（en）

许可证

CC-BY-NC-4.0，仅限非商业研究用途。

数据来源

Part-1: 源图像来自Openimages。
Part-2: 图像收集自Photoshopbattles, Photoshop gurus, Photoshoprequest, 和 Zhopped。
Part-3: 图像来自Unsplash, SAM, 和 JourneyDB。

使用说明

下载后需重新组装分割的文件为原始的.tar.gz文件，然后解压。

bash cat source_images.tar.gz.part-* > source_images.tar.gz cat target_images.tar.gz.part-* > target_images.tar.gz

文件夹"annotations"包含原始指令，"annotations_GPT4V"存储由GPT-4V重写的指令。

5,000+

优质数据集

54 个

任务类型

进入经典数据集