five

AILab-CVC/SEED-Data-Edit-Part1-Openimages

收藏
Hugging Face2024-05-05 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/AILab-CVC/SEED-Data-Edit-Part1-Openimages
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 task_categories: - text-to-image language: - en size_categories: - 1M<n<10M --- ## SEED-Data-Edit ![image](https://github.com/AILab-CVC/SEED-X/blob/main/demos/SEED-Data-Edit.jpg?raw=true) SEED-Data-Edit is a hybrid dataset for **instruction-guided image editing** with a total of 3.7 image editing pairs, which comprises three distinct types of data: **Part-1**: Large-scale high-quality editing data produced by automated pipelines (3.5M editing pairs). **Part-2**: Real-world scenario data collected from the internet (52K editing pairs). **Part-3**: High-precision multi-turn editing data annotated by humans (95K editing pairs, 21K multi-turn rounds with a maximum of 5 rounds). This repo contains Part-1 of SEED-Data-Edit, with source images come from [Openimages](https://arxiv.org/pdf/1811.00982). After downloading the data, you first need to reassemble the split files back into the original .tar.gz file as below, and then unzip the files. ```bash cat source_images.tar.gz.part-* > source_images.tar.gz cat target_images.tar.gz.part-* > target_images.tar.gz ``` The folder "annotations" contains the original instructions, while this folder "annotations_GPT4V" stores a small portion of instructions that have been rewritten by GPT-4V. ## SEED-X-Edit You can download the image editing model SEED-X-Edit in [Model](https://huggingface.co/AILab-CVC/SEED-X-17B/tree/main/seed_x_edit), which is instruction tuned from the pre-trained [SEED-X](https://arxiv.org/abs/2404.14396) with SEED-Data-Edit. For inference with SEED-X-Edit, you can refer to [SEED-X](https://github.com/AILab-CVC/SEED-X/tree/main). ![image](https://github.com/AILab-CVC/SEED-X/blob/main/demos/edit_comparison.jpg?raw=true) ## License SEED-Data-Edit is released under the license CC-BY-NC-4.0 for non-commercial research purpose only. Any use of the dataset for commercial purposes is strictly prohibited. For Part-1, we use images from [Unsplash](https://github.com/unsplash/datasets) and [Openimages](https://arxiv.org/pdf/1811.00982). For Part-2, we collect images from [Photoshopbattles](https://www.reddit.com/r/photoshopbattles/), [Photoshop gurus](https://www.photoshopgurus.com/forum/), [Photoshoprequest](https://www.reddit.com/r/PhotoshopRequest/), and [Zhopped](http://zhopped.com/). For Part-3, we use images from [Unsplash](https://github.com/unsplash/datasets), [SAM](https://arxiv.org/abs/2304.02643), and [JourneyDB](https://arxiv.org/abs/2307.00716). Tencent does not hold the copyright for these images and the copyright belongs to the original owner. If any image in SEED-Data-Edit infringes upon your rights, please contact us immediately and we will promptly remove the corresponding data.
提供机构:
AILab-CVC
原始信息汇总

数据集概述

数据集名称

  • SEED-Data-Edit

数据集类型

  • 混合数据集,用于指令引导的图像编辑

数据集组成

  • Part-1: 大规模高质量编辑数据,由自动化管道生成(3.5M编辑对)。
  • Part-2: 真实世界场景数据,从互联网收集(52K编辑对)。
  • Part-3: 高精度多轮编辑数据,由人工标注(95K编辑对,21K多轮回合,最多5轮)。

数据集大小

  • 1M<n<10M

语言

  • 英语(en)

许可证

  • CC-BY-NC-4.0,仅限非商业研究用途。

数据来源

使用说明

  • 下载后需重新组装分割的文件为原始的.tar.gz文件,然后解压。

bash cat source_images.tar.gz.part-* > source_images.tar.gz cat target_images.tar.gz.part-* > target_images.tar.gz

  • 文件夹"annotations"包含原始指令,"annotations_GPT4V"存储由GPT-4V重写的指令。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作