OpenGPT-4o-Image
收藏魔搭社区2026-05-15 更新2025-10-11 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/OpenGPT-4o-Image
下载链接
链接失效反馈官方服务:
资源简介:
# OpenGPT-4o-Image Dataset
We introduce **OpenGPT-4o-Image**, a large-scale dataset constructed using a novel methodology that combines hierarchical task taxonomy with automated data generation. Our taxonomy not only includes fundamental capabilities such as text rendering and style control but also introduces highly practical yet challenging categories like **scientific imagery** for chemistry illustrations and **complex instruction editing** requiring simultaneous execution of multiple operations. Through an automated pipeline leveraging structured resource pools and GPT-4o, we generate 80k high-quality instruction-image pairs with controlled diversity, covering 11 major domains and 51 subtasks.
[Paper](https://huggingface.co/papers/2509.24900) | [Code](https://github.com/NROwind/OpenGPT-4o-Image)
<div align=center>
<img src="./assets/teaser.png" width = "90%" alt="Teaser Image" align=center/>
</div>
This dataset is designed for text-to-image and image editing tasks. It is split into two main parts:
* **Text-to-Image Generation**: Generating images from textual descriptions.
* **Image Editing**: Modifying existing images based on instructional prompts.
| Data Type | Number of Samples |
| :---------------------- | :---------------- |
| Text-to-Image Generation | ~40k |
| Image Editing | ~40k |
| **Total** | **~80k** |
## Quick Start
### 1. Download from Hugging Face
First, download all the split archive files (`gen.tar.gz.*` and `editing.tar.gz.*`) from the [Hugging Face repository](https://huggingface.co/datasets/WINDop/OpenGPT-4o-Image).
### 2. Decompress the Files
The dataset is split into multiple archives. Use the following commands in your terminal to merge and extract them.
```bash
# Decompress the text-to-image generation data
cat gen.tar.gz.* | tar -xzvf -
# Decompress the image editing data
cat editing.tar.gz.* | tar -xzvf -
```
After running these commands, you will get the `OpenGPT-4o-Image` directory containing all the data.
## Dataset Structure
The decompressed directory has the following structure:
```
├── OpenGPT-4o-Image
│ ├── gen/ # Contains images for the generation task
│ ├── editing/ # Contains input/output images for the editing task
│ ├── gen.json # Annotations for the generation task
│ └── editing.json # Annotations for the editing task
```
## Data Format
The dataset annotations are provided in two JSON files, corresponding to the two sub-tasks. Each line in the JSON file is a JSON object.
### `gen.json` (Text-to-Image Generation)
This file contains prompts and their corresponding generated image paths.
* `input_prompt`: The text prompt used for image generation.
* `output_image`: The relative path to the generated image.
**Example:**
```json
{
"input_prompt": "Collage style. Weave several satin‑finish orchid hexagons with woolen peach cubes, floating against a gradient backdrop.",
"output_image": "gen/0.png"
}
```
### `editing.json` (Image Editing)
This file contains editing instructions, input images, and the resulting output images.
* `input_prompt`: The instruction describing the desired edit.
* `input_image`: A list containing the relative path to the source image to be edited.
* `output_image`: The relative path to the edited result image.
**Example:**
```json
{
"input_prompt": "Remove the word 'SALAD' at the top of the chalkboard.",
"input_image": [
"editing/input_0.png"
],
"output_image": "editing/output_0.png"
}
```
## Resources
- Github: [https://github.com/NROwind/OpenGPT-4o-Image](https://github.com/NROwind/OpenGPT-4o-Image)
- Paper: [OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing](https://huggingface.co/papers/2509.24900)
## Citation
If you use this dataset in your research, please consider citing:
```bibtex
@misc{chen2025opengpt4oimagecomprehensivedatasetadvanced,
title={OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing},
author={Zhihong Chen and Xuehai Bai and Yang Shi and Chaoyou Fu and Huanyu Zhang and Haotian Wang and Xiaoyan Sun and Zhang Zhang and Liang Wang and Yuanxing Zhang and Pengfei Wan and Yi-Fan Zhang},
year={2025},
eprint={2509.24900},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.24900},
}
```
## 💡 Representive Examples of Each Domain
<div align=center>
<img src="./assets/generation-examples.png" width = "100%" alt="Generation Examples" align=center/>
</div>
<div align=center>
<img src="./assets/editing-examples.png" width = "100%" alt="Editing Examples" align=center/>
</div>
# OpenGPT-4o-Image 数据集
我们提出**OpenGPT-4o-Image**——一款采用**分层任务分类法(hierarchical task taxonomy)**与**自动化数据生成(automated data generation)**相结合的创新方法构建的大规模数据集。该分类体系不仅涵盖**文本渲染(text rendering)**、**风格控制(style control)**等基础能力,还新增了兼具实用性与挑战性的类别,例如用于化学插画的**科学图像(scientific imagery)**,以及需要同时执行多项操作的**复杂指令编辑(complex instruction editing)**。我们通过结合**结构化资源池(structured resource pools)**与GPT-4o的**自动化流水线(automated pipeline)**,生成了8万对高质量、多样性可控的指令-图像配对数据(instruction-image pairs),覆盖11大领域与51个子任务。
[论文](https://huggingface.co/papers/2509.24900) | [代码](https://github.com/NROwind/OpenGPT-4o-Image)
<div align=center>
<img src="./assets/teaser.png" width = "90%" alt="预览示意图" align=center/>
</div>
本数据集面向**文本到图像生成(Text-to-Image Generation)**与**图像编辑(Image Editing)**任务,主要分为两大模块:
* **文本到图像生成(Text-to-Image Generation)**:根据文本描述生成图像。
* **图像编辑(Image Editing)**:根据指令提示修改现有图像。
| 数据类型 | 样本数量 |
| :---------------------- | :---------------- |
| 文本到图像生成 | ~4万 |
| 图像编辑 | ~4万 |
| **总计** | **~8万** |
## 快速开始
### 1. 从Hugging Face下载
首先,请从[Hugging Face数据集仓库](https://huggingface.co/datasets/WINDop/OpenGPT-4o-Image)下载所有分卷压缩包文件(`gen.tar.gz.*`与`editing.tar.gz.*`)。
### 2. 解压文件
本数据集已拆分为多个分卷压缩包,请在终端中执行以下命令进行合并与解压。
bash
# 解压文本到图像生成数据集
cat gen.tar.gz.* | tar -xzvf -
# 解压图像编辑数据集
cat editing.tar.gz.* | tar -xzvf -
执行上述命令后,将得到包含全部数据的`OpenGPT-4o-Image`目录。
## 数据集结构
解压后的目录结构如下:
├── OpenGPT-4o-Image
│ ├── gen/ # 存放生成任务的图像文件
│ ├── editing/ # 存放编辑任务的输入/输出图像
│ ├── gen.json # 生成任务的标注文件
│ └── editing.json # 编辑任务的标注文件
## 数据格式
本数据集的标注文件分为两个JSON文件,分别对应两个子任务。每个JSON文件的每一行均为一个JSON对象。
### `gen.json`(文本到图像生成任务)
该文件存储了用于图像生成的文本提示词及其对应的生成图像路径。
* `input_prompt`:用于图像生成的文本提示词。
* `output_image`:生成图像的相对路径。
**示例:**
json
{
"input_prompt": "拼贴画风格。将多个缎面质感的兰花六边形与毛毡质感的桃色方块编织组合,悬浮于渐变背景之上。",
"output_image": "gen/0.png"
}
### `editing.json`(图像编辑任务)
该文件存储了编辑指令、输入图像与最终输出的编辑后图像。
* `input_prompt`:描述所需编辑操作的指令。
* `input_image`:包含待编辑源图像相对路径的列表。
* `output_image`:编辑后结果图像的相对路径。
**示例:**
json
{
"input_prompt": "移除黑板顶部的'SALAD'字样。",
"input_image": [
"editing/input_0.png"
],
"output_image": "editing/output_0.png"
}
## 资源链接
- Github: [https://github.com/NROwind/OpenGPT-4o-Image](https://github.com/NROwind/OpenGPT-4o-Image)
- 论文: [OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing](https://huggingface.co/papers/2509.24900)
## 引用方式
若您在研究中使用本数据集,请引用如下文献:
bibtex
@misc{chen2025opengpt4oimagecomprehensivedatasetadvanced,
title={OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing},
author={Zhihong Chen and Xuehai Bai and Yang Shi and Chaoyou Fu and Huanyu Zhang and Haotian Wang and Xiaoyan Sun and Zhang Zhang and Liang Wang and Yuanxing Zhang and Pengfei Wan and Yi-Fan Zhang},
year={2025},
eprint={2509.24900},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.24900},
}
## 💡 各领域典型示例
<div align=center>
<img src="./assets/generation-examples.png" width = "100%" alt="生成示例" align=center/>
</div>
<div align=center>
<img src="./assets/editing-examples.png" width = "100%" alt="编辑示例" align=center/>
</div>
提供机构:
maas
创建时间:
2025-10-09



