Complex-Edit

Name: Complex-Edit
Creator: maas
Published: 2025-12-05 12:04:52
License: 暂无描述

魔搭社区2025-12-05 更新2025-04-26 收录

下载链接：

https://modelscope.cn/datasets/UCSC-VLAA/Complex-Edit

下载链接

链接失效反馈

官方服务：

资源简介：

# ***Complex-Edit***: CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark [📃Arxiv](https://arxiv.org/abs/2504.13143) | [🌐Project Page](https://ucsc-vlaa.github.io/Complex-Edit/) | [💻Github](https://github.com/UCSC-VLAA/Complex-Edit) | [📚Dataset](https://huggingface.co/datasets/UCSC-VLAA/Complex-Edit) | [📄HF Paper](https://huggingface.co/papers/2504.13143) We introduce ***Complex-Edit***, a comprehensive benchmark designed to systematically evaluate instruction-based image editing models across instructions of varying complexity. To develop this benchmark, we harness GPT-4o to automatically collect a diverse set of editing instructions at scale. Our approach follows a well-structured “Chain-of-Edit” pipeline: we first generate individual atomic editing tasks independently and then integrate them to form cohesive, complex instructions. Additionally, we introduce a suite of metrics to assess various aspects of editing performance, along with a VLM-based auto-evaluation pipeline that supports large-scale assessments. Our benchmark yields several notable insights: 1. Opensource models significantly underperform relative to proprietary, closed-source models, with the performance gap widening as instruction complexity increases; 2. Increased instructional complexity primarily impairs the models' ability to retain key elements from the input images and to preserve the overall aesthetic quality; 3. Decomposing a complex instruction into a sequence of atomic steps, executed in a step-by-step manner, substantially degrades performance across multiple metrics; 4. A straightforward Best-of-N selection strategy improves results for both direct editing and the step-by-step sequential approach; 5. We observe a “curse of synthetic data”: when synthetic data is involved in model training, the edited images from such models tend to appear increasingly synthetic as the complexity of the editing instructions rises — a phenomenon that intriguingly also manifests in the latest GPT-4o outputs. ## Folder Structure The dataset is organized as follows: ``` ├── README.md └── test ├── real | ├── images | | ├── 0000.png | | ├── 0001.png | | ├── 0002.png │ │ └── ... │ └── metadata.jsonl └── syn ├── images ├── 0000.png ├── 0001.png ├── 0002.png └── metadata.jsonl ``` Input images are stored in `test` and instructions are stored in `edit`. `real` and `syn` refer to real-life input and synthetic input respectively. ## Example JSON The JSON file structure is demonstrated as below: ```json { "reasoning": "...", "original_sequence": [ { "name": "Change Background", "instruction": "Replace the existing background with a busy metropolitan skyline." }, { "name": "Add Special Effects", "instruction": "Add motion blur to the cars to depict motion." }, ... ], "sequence": [ { "name": "Change Background", "instruction": "Replace the existing background with a busy metropolitan skyline." }, { "name": "Add Special Effects", "instruction": "Add motion blur to the cars." }, ... ], "compound": [ { "reasoning": "none", "compound_instruction": "Replace the existing background with a busy metropolitan skyline." }, { "reasoning": "...", "compound_instruction": "Replace the background with a busy metropolitan skyline and apply motion blur to the cars to simulate movement." }, ... ] } ``` Each JSON file in `edit` contains a sequence of atmoic instructions `sequence` and 8 compound instructions in `compound` for a corresponding input image. `original_sequence` is the sequence of atomic instructions without simplification. The compound instructions are at different complexity levels ranging from $C_1$ to $C_8$ in an ascending order. ## Usage ```py from datasets import load_dataset dataset = load_dataset("UCSC-VLAA/Complex-Edit") sample = dataset["test_real"][0] # Print the compound instructions. Complexity from C1 to C8. for i, compound in enumerate(sample["edit"]["compound"]): print(f"C{i + 1} Instruction: {compound['compound_instruction']}") # Print the atomic instruction sequence. for i, compound in enumerate(sample["edit"]["sequence"]): print(f"Step #{i + 1} Atomic Instruction: {compound['instruction']}") # Show the input image. sample["image"].show() ```

# ***Complex-Edit***：面向复杂度可控图像编辑基准测试的类思维链指令生成基准 [📃Arxiv论文](https://arxiv.org/abs/2504.13143) | [🌐项目主页](https://ucsc-vlaa.github.io/Complex-Edit/) | [💻代码仓库](https://github.com/UCSC-VLAA/Complex-Edit) | [📚数据集](https://huggingface.co/datasets/UCSC-VLAA/Complex-Edit) | [📄HF论文页面](https://huggingface.co/papers/2504.13143) 我们提出***Complex-Edit***，这是一个全面的基准测试平台，旨在系统评估基于指令的图像编辑模型在不同复杂度指令下的性能。为构建该基准，我们利用GPT-4o大规模自动收集多样化的编辑指令集。我们的方法遵循结构严谨的“编辑链（Chain-of-Edit）”流程：首先独立生成单个原子编辑任务，随后将其整合为连贯且复杂的复合指令。此外，我们还引入了一套用于评估编辑性能多维度表现的指标体系，以及一个基于视觉语言模型（Vision-Language Model, VLM）的自动评估流程，可支持大规模评估任务。通过该基准测试，我们得到了多项值得关注的结论： 1. 开源模型的性能显著落后于闭源专有模型，且随着指令复杂度提升，二者的性能差距会进一步扩大； 2. 指令复杂度的提升主要会削弱模型保留输入图像关键元素以及维持整体美学质量的能力； 3. 将复杂指令拆解为原子步骤并按顺序逐步执行的方式，会在多项指标上大幅降低模型性能； 4. 简单的最优N选1（Best-of-N）选择策略，可同时提升直接编辑和分步顺序编辑两种方式的效果； 5. 我们观察到了“合成数据诅咒”现象：当模型训练涉及合成数据时，随着编辑指令复杂度提升，模型生成的编辑图像会愈发具有合成感——这一现象在最新的GPT-4o输出结果中同样存在。 ## 文件夹结构该数据集的组织形式如下： ├── README.md └── test ├── real | ├── images | | ├── 0000.png | | ├── 0001.png | | ├── 0002.png │ │ └── ... │ └── metadata.jsonl └── syn ├── images ├── 0000.png ├── 0001.png ├── 0002.png └── metadata.jsonl 输入图像存储于`test`目录，编辑指令存储于`edit`目录。其中`real`与`syn`分别对应真实输入图像与合成输入图像。 ## 示例JSON格式该JSON文件的结构示例如下： json { "reasoning": "...", "original_sequence": [ { "name": "更换背景", "instruction": "将现有背景替换为繁忙的都市天际线。" }, { "name": "添加特效", "instruction": "为汽车添加运动模糊效果以体现运动感。" }, ... ], "sequence": [ { "name": "更换背景", "instruction": "将现有背景替换为繁忙的都市天际线。" }, { "name": "添加特效", "instruction": "为汽车添加运动模糊效果。" }, ... ], "compound": [ { "reasoning": "无", "compound_instruction": "将现有背景替换为繁忙的都市天际线。" }, { "reasoning": "...", "compound_instruction": "将背景替换为繁忙的都市天际线，并为汽车添加运动模糊效果以模拟运动状态。" }, ... ] } `edit`目录下的每个JSON文件均对应一张输入图像，其中包含原子指令序列`sequence`以及8条不同复杂度的复合指令`compound`。`original_sequence`为未经过简化的原始原子指令序列。复合指令的复杂度从$C_1$到$C_8$依次递增。 ## 使用方法 py from datasets import load_dataset dataset = load_dataset("UCSC-VLAA/Complex-Edit") sample = dataset["test_real"][0] # 打印复合指令，复杂度从C1到C8 for i, compound in enumerate(sample["edit"]["compound"]): print(f"C{i + 1} 指令: {compound['compound_instruction']}") # 打印原子指令序列 for i, compound in enumerate(sample["edit"]["sequence"]): print(f"第{i + 1}步原子指令: {compound['instruction']}") # 展示输入图像 sample["image"].show()

提供机构：

maas

创建时间：

2025-04-21

5,000+

优质数据集

54 个

任务类型

进入经典数据集