X2Edit-Dataset
收藏魔搭社区2026-01-06 更新2025-08-16 收录
下载链接:
https://modelscope.cn/datasets/Oppo/X2Edit-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
<div align="center">
<h1>X2Edit</h1>
<a href='https://github.com/OPPO-Mente-Lab/X2Edit'><img src="https://img.shields.io/badge/GitHub-OPPOer/X2Edit-blue.svg?logo=github" alt="GitHub"></a>
<a href='https://arxiv.org/abs/2508.07607'><img src='https://img.shields.io/badge/arXiv-2508.07607-b31b1b.svg'></a>
<a href='https://huggingface.co/OPPOer/X2Edit'><img src='https://img.shields.io/badge/🤗%20HuggingFace-X2Edit-ffd21f.svg'></a>
<a href='https://www.modelscope.cn/datasets/AIGCer-OPPO/X2Edit-Dataset'><img src='https://img.shields.io/badge/🤖%20ModelScope-X2Edit Dataset-purple.svg'></a>
</div>
## Introduction
**X2Edit Dataset** is a comprehensive image editing dataset that covers 14 diverse editing tasks and exhibits substantial advantages over existing open-source datasets including AnyEdit, HQ-Edit, UltraEdit, SEED-Data-Edit, ImgEdit and OmniEdit.
For the relevant data construction scripts, model training and inference scripts, please refer to [**X2Edit**](https://github.com/OPPO-Mente-Lab/X2Edit).
## News
- 2025/09/16: We are about to release a dataset constructed by Qwen-Image and Qwen-Image-Edit, with a size of **2 Million**. This sub-dataset focuses on subject-driven generation with facial consistency. We employ Qwen-Image to generate the original images, utilize Qwen3 to produce editing instructions, and finally apply Qwen-Image-Edit to edit the images. Furthermore, we employ a face detection model to calculate face detection confidence, and utilize Dino and Clip to compute the degree of facial similarity between original images and editing images. Finally, the data is filtered based on these two metrics.
## Dataset Statistics
### Data Distribution Constructed by Each Model
| Model | Size |
|:------|:---------|
| Bagel | 502K |
| GPT-4o | 232K |
| Kontext | 2.2M |
| Step1X-Edit | 900K |
| LaMa | 200K |
| OmniConsistency | 250K |
| TextFlux | 280K |
| qwen-image-edit | 2M |
## Unified Directory Structure
```
X2Edit-data/
├── bagel/
│ ├── 0/
│ ├── 00000.tar
│ ├── 000000.1.0.jpg # Original image
│ ├── 000000.2.0.jpg # Editing image
│ ├── 000000.json # information image
│ ├── 000000.txt # Editing instruction
│ └── ......
│ ├── 00001.tar
│ ├── 00002.tar
│ ├── 00003.tar
│ ├── 00004.tar
│ └── ......
│ ├── 1/
│ ├── 2/
│ ├── 3/
│ ├── 4/
│ ├── 5/
│ ├── 6/
│ └── 7/
├── gpt4o/
├── kontext/
├── kontext_subject/
├── lama/
├── ominiconsistencey/
├── step1x-edit/
├── qwen-image-edit-Asian-portrait/
├── qwen-image-edit-NonAsian-portrait/
└── textflux/
├── 0/
├── 00000.tar
├── 000000.1.0.jpg # Original image
├── 000000.1.1.jpg # mask image of text foregroud
├── 000000.2.0.jpg # Editing image
├── 000000.json # information image
├── 000000.txt # Editing instruction
└── ......
├── 00001.tar
├── 00002.tar
├── 00003.tar
├── 00004.tar
└── ......
```
Each subfolder is named after the model used to construct the data, and each tar file contains about 5,000 sets of data.
## Json Format
### Common Fields
```python
{
"caption_en": "string", # English description of the image.
"caption_zh": "string", # Chinese description of the image.
"instruction": "string", # Editing instruction, it could be Chinese or English.
"instruction_zh": "string", # Chinese Editing instruction.
"task": "string", # Editing task type. (e.g., "reasoning", "subject deletion")
"model": "string", # Model for constructing the data. (e.g., "Kontext", "step1x-edit")
"score_7b": "string", # Score of Qwen2.5-7B evaluation. (e.g., "[5, 5]")
"liqe_score": "float", # liqe score of original image.
"liqe_score_edit": "float", # liqe score of editing image.
"liqe_score_clip": "float", # liqe clip score of original image.
"liqe_score_clip_edit": "float", # liqe clip score of editing image.
"aesthetic_score_v2_5": "float", # aesthetic score of original image.
"aesthetic_score_v2_5_edit": "float" , # aesthetic score of editing image.
}
```
### Dataset-Specific Fields
#### step1x-edit
```python
{
"score": "string", # Score of Qwen2.5-72B evaluation.
}
```
#### kontext_subject
```python
{
"dino": "float", # DINOv2 score between original image and editing image.
"clipI": "float", # CLIP score between original image and editing image.
"clipT": "float", # CLIP score between editing instruction and editing image.
}
```
#### qwen-image-edit
```python
{
"instruction_ori": "string", # Chinese description of the image.
"instruction": "string", # Chinese Editing instruction.
"clipT": "float", # CLIP score between faces in original images and editing image.
"dino": "float", # DINOv2 score between faces in original images and editing image.
"confidence_in": "float", # face detection confidence of original images
"confidence_out": "float", # face detection confidence of editing images
"race": "string", # Human race. (e.g., "East Asian")
"race_conf": "float", # The probability of belonging to this human race
}
```
#### textflux
```python
{
"font": [
[
"SHEIN",
"(43,41) (225,41) (225,79) (43,79)"
]
] # Text box coordinates
}
```
## Usage Guide
### Download data
```bash
git lfs install
git clone https://huggingface.co/datasets/OPPOer/X2Edit-Dataset
```
### load data
```python
from torchdata.datapipes.iter import FileOpener
from torchdata.dataloader2 import MultiProcessingReadingService, DataLoader2
def decode(item):
key, value = item
if key.endswith(".txt"):
return key, value.read().decode("utf-8")
if key.endswith(".jpg"):
return key, Image.open(value).convert("RGB")
if key.endswith(".json"):
return key, json.load(value)
def collate_fn(examples):
key = [example["__key__"].split("/")[-1] for example in examples]
jpg1 = [example["1.0.jpg"] for example in examples]
jpg2 = [example["2.jpg"] for example in examples]
json = [example["txt"] for example in examples]
txt = [example["json"] for example in examples]
jpg3 = [example["1.1.jpg"] for example in examples if "1.1.jpg" in example]
return {"jpg1": jpg1,"jpg2": jpg2,"txt": txt, "key": key, "json": json, "jpg3": jpg3}
tar_path = 'X2Edit-data/bagel/0/00000.tar'
rs = MultiProcessingReadingService(num_workers=1)
dataset = FileOpener([tar_name], mode="b").load_from_tar().map(decode).webdataset(). \
batch(1).collate(collate_fn=collate_fn)
dl = DataLoader2(dataset, reading_service=rs)
for obj in tqdm(dl):
for i in range(len(obj["json"])):
json = obj["json"][i]
jpg1 = obj["jpg1"][i]
jpg2 = obj["jpg2"][i]
txt = obj["txt"][i]
if "jpg3" in obj:
jpg3 = obj["jpg3"][i]
```
## Acknowledgement
[FLUX.1-Kontext-dev](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev), [Step1X-Edit](https://github.com/stepfun-ai/Step1X-Edit), [Bagel](https://github.com/stepfun-ai/Step1X-Edit), [GPT-4o](), [LaMa](https://github.com/stepfun-ai/Step1X-Edit), [TextFlux](https://github.com/stepfun-ai/Step1X-Edit), [OmniConsistency](https://github.com/stepfun-ai/Step1X-Edit).
## Citation
🌟 If you find our work helpful, please consider citing our paper and leaving valuable stars
```
@misc{ma2025x2editrevisitingarbitraryinstructionimage,
title={X2Edit: Revisiting Arbitrary-Instruction Image Editing through Self-Constructed Data and Task-Aware Representation Learning},
author={Jian Ma and Xujie Zhu and Zihao Pan and Qirong Peng and Xu Guo and Chen Chen and Haonan Lu},
year={2025},
eprint={2508.07607},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2508.07607},
}
```
<div align="center">
<h1>X2Edit</h1>
<a href='https://github.com/OPPO-Mente-Lab/X2Edit'><img src="https://img.shields.io/badge/GitHub-OPPOer/X2Edit-blue.svg?logo=github" alt="GitHub"></a>
<a href='https://arxiv.org/abs/2508.07607'><img src='https://img.shields.io/badge/arXiv-2508.07607-b31b1b.svg'></a>
<a href='https://huggingface.co/OPPOer/X2Edit'><img src='https://img.shields.io/badge/🤗%20HuggingFace-X2Edit-ffd21f.svg'></a>
<a href='https://www.modelscope.cn/datasets/AIGCer-OPPO/X2Edit-Dataset'><img src='https://img.shields.io/badge/🤖%20ModelScope-X2Edit%20Dataset-purple.svg'></a>
</div>
## 简介
**X2Edit数据集**是一款涵盖14种多样化图像编辑任务的综合性数据集,相较于AnyEdit、HQ-Edit、UltraEdit、SEED-Data-Edit、ImgEdit及OmniEdit等现有开源数据集,具备显著优势。
如需获取相关数据构建脚本、模型训练与推理脚本,请参阅[**X2Edit**](https://github.com/OPPO-Mente-Lab/X2Edit)。
## 最新动态
- 2025/09/16:我们即将发布由Qwen-Image与Qwen-Image-Edit构建的子数据集,规模达**200万**样本。该子数据集聚焦于具备面部一致性的主体驱动生成任务:我们首先使用Qwen-Image生成原始图像,借助Qwen3生成编辑指令,最终通过Qwen-Image-Edit完成图像编辑。此外,我们采用人脸检测模型计算人脸检测置信度,并利用Dino与CLIP计算原始图像与编辑后图像的面部相似度,最终基于这两项指标对数据进行筛选。
## 数据集统计
### 各模型构建的数据分布
| 模型 | 样本量 |
|:------|:---------|
| Bagel | 502K |
| GPT-4o | 232K |
| Kontext | 2.2M |
| Step1X-Edit | 900K |
| LaMa | 200K |
| OmniConsistency | 250K |
| TextFlux | 280K |
| qwen-image-edit | 2M |
## 统一目录结构
X2Edit-data/
├── bagel/
│ ├── 0/
│ ├── 00000.tar
│ ├── 000000.1.0.jpg # 原始图像
│ ├── 000000.2.0.jpg # 编辑后图像
│ ├── 000000.json # 图像信息文件
│ ├── 000000.txt # 编辑指令文本
│ └── ......
│ ├── 00001.tar
│ ├── 00002.tar
│ ├── 00003.tar
│ ├── 00004.tar
│ └── ......
│ ├── 1/
│ ├── 2/
│ ├── 3/
│ ├── 4/
│ ├── 5/
│ ├── 6/
│ └── 7/
├── gpt4o/
├── kontext/
├── kontext_subject/
├── lama/
├── ominiconsistencey/
├── step1x-edit/
├── qwen-image-edit-Asian-portrait/
├── qwen-image-edit-NonAsian-portrait/
└── textflux/
├── 0/
├── 00000.tar
├── 000000.1.0.jpg # 原始图像
├── 000000.1.1.jpg # 文本前景掩码图像
├── 000000.2.0.jpg # 编辑后图像
├── 000000.json # 图像信息文件
├── 000000.txt # 编辑指令文本
└── ......
├── 00001.tar
├── 00002.tar
├── 00003.tar
├── 00004.tar
└── ......
每个子文件夹以用于构建数据的模型名称命名,每个tar文件约包含5000组数据样本。
## JSON格式规范
### 通用字段
python
{
"caption_en": "string", # 图像的英文描述文本。
"caption_zh": "string", # 图像的中文描述文本。
"instruction": "string", # 编辑指令,支持中文或英文。
"instruction_zh": "string", # 中文编辑指令。
"task": "string", # 编辑任务类型(例如:"reasoning" 对应 "推理","subject deletion" 对应 "主体删除")。
"model": "string", # 用于构建该组数据的模型(例如:"Kontext"、"step1x-edit")。
"score_7b": "string", # Qwen2.5-7B 模型的评估得分(示例:"[5, 5]")。
"liqe_score": "float", # 原始图像的LIQE得分。
"liqe_score_edit": "float", # 编辑后图像的LIQE得分。
"liqe_score_clip": "float", # 原始图像的CLIP-LIQE得分。
"liqe_score_clip_edit": "float", # 编辑后图像的CLIP-LIQE得分。
"aesthetic_score_v2_5": "float", # 原始图像的美学评分。
"aesthetic_score_v2_5_edit": "float", # 编辑后图像的美学评分。
}
### 数据集专属字段
#### step1x-edit
python
{
"score": "string", # Qwen2.5-72B 模型的评估得分。
}
#### kontext_subject
python
{
"dino": "float", # 原始图像与编辑后图像的DINOv2相似度得分。
"clipI": "float", # 原始图像与编辑后图像的CLIP相似度得分。
"clipT": "float", # 编辑指令与编辑后图像的CLIP相似度得分。
}
#### qwen-image-edit
python
{
"instruction_ori": "string", # 图像的中文描述文本。
"instruction": "string", # 中文编辑指令。
"clipT": "float", # 原始图像与编辑后图像中人脸的CLIP相似度得分。
"dino": "float", # 原始图像与编辑后图像中人脸的DINOv2相似度得分。
"confidence_in": "float", # 原始图像的人脸检测置信度。
"confidence_out": "float", # 编辑后图像的人脸检测置信度。
"race": "string", # 人类种族(例如:"East Asian" 对应 "东亚人")。
"race_conf": "float", # 所属该种族的概率。
}
#### textflux
python
{
"font": [
[
"SHEIN",
"(43,41) (225,41) (225,79) (43,79)"
]
] # 文本框坐标
}
## 使用指南
### 数据下载
bash
git lfs install
git clone https://huggingface.co/datasets/OPPOer/X2Edit-Dataset
### 数据加载
python
from torchdata.datapipes.iter import FileOpener
from torchdata.dataloader2 import MultiProcessingReadingService, DataLoader2
def decode(item):
key, value = item
if key.endswith(".txt"):
return key, value.read().decode("utf-8")
if key.endswith(".jpg"):
return key, Image.open(value).convert("RGB")
if key.endswith(".json"):
return key, json.load(value)
def collate_fn(examples):
key = [example["__key__"].split("/")[-1] for example in examples]
jpg1 = [example["1.0.jpg"] for example in examples]
jpg2 = [example["2.jpg"] for example in examples]
json = [example["txt"] for example in examples]
txt = [example["json"] for example in examples]
jpg3 = [example["1.1.jpg"] for example in examples if "1.1.jpg" in example]
return {"jpg1": jpg1,"jpg2": jpg2,"txt": txt, "key": key, "json": json, "jpg3": jpg3}
tar_path = 'X2Edit-data/bagel/0/00000.tar'
rs = MultiProcessingReadingService(num_workers=1)
dataset = FileOpener([tar_name], mode="b").load_from_tar().map(decode).webdataset().
batch(1).collate(collate_fn=collate_fn)
dl = DataLoader2(dataset, reading_service=rs)
for obj in tqdm(dl):
for i in range(len(obj["json"])):
json = obj["json"][i]
jpg1 = obj["jpg1"][i]
jpg2 = obj["jpg2"][i]
txt = obj["txt"][i]
if "jpg3" in obj:
jpg3 = obj["jpg3"][i]
## 致谢
感谢[FLUX.1-Kontext-dev](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev)、[Step1X-Edit](https://github.com/stepfun-ai/Step1X-Edit)、[Bagel](https://github.com/stepfun-ai/Step1X-Edit)、[GPT-4o]()、[LaMa](https://github.com/stepfun-ai/Step1X-Edit)、[TextFlux](https://github.com/stepfun-ai/Step1X-Edit)及[OmniConsistency](https://github.com/stepfun-ai/Step1X-Edit)。
## 引用
🌟 若本研究对您的工作有所助益,请引用我们的论文并为仓库点亮Star:
bibtex
@misc{ma2025x2editrevisitingarbitraryinstructionimage,
title={X2Edit: 通过自构建数据与任务感知表征学习重新审视任意指令图像编辑},
author={Jian Ma and Xujie Zhu and Zihao Pan and Qirong Peng and Xu Guo and Chen Chen and Haonan Lu},
year={2025},
eprint={2508.07607},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2508.07607},
}
提供机构:
maas
创建时间:
2025-08-19



