a1557811266/Inter-Edit-Train

Name: a1557811266/Inter-Edit-Train
Creator: a1557811266
Published: 2026-04-04 10:30:15
License: 暂无描述

Hugging Face2026-04-04 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/a1557811266/Inter-Edit-Train

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en - zh pretty_name: Inter-Edit-Train size_categories: - 1M<n<10M task_categories: - image-to-image tags: - image-editing - benchmark - computer-vision - multimodal --- # Inter-Edit-Train Inter-Edit-Train is the official large-scale training set released for the CVPR 2026 paper **Inter-Edit: First Benchmark for Interactive Instruction-Based Image Editing**. This dataset is designed for the Interactive Instruction-based Image Editing (I^3E) task, where a model performs localized image edits from a concise textual instruction together with imprecise spatial guidance. ## Highlights - **1,099,964** image editing pairs - **610,186** unique source images - Four edit types: **Local**, **Add**, **Remove**, and **Texture** - Seven common aspect ratios from **16:9** to **9:16** - Includes edit instructions, masks, bounding boxes, and an extra `better_data` flag - All release filenames are anonymized with index-based names; original internal filenames are not exposed in the packaged archives ## Relation to the paper This release corresponds to the training split described in the paper. The released manifest keeps the fields needed for training and data usage: - `instruction` - `edit_type` - `bounding_box` - `bbox_reference_dimensions` - `better_data` - anonymized source / target / mask asset locations The key `better_data` is **not** a paper-defined benchmark field. It is an additional release-only flag indicating samples that were judged to be more suitable for training after filtering. ## Data layout Because the full training set is extremely large, the assets are released as sharded tar archives: - `source_shards/source-xxxxx-of-xxxxx.tar` - `asset_shards/asset-xxxxx-of-xxxxx.tar` - `metadata/train-xxxxx-of-xxxxx.jsonl.gz` Each asset name inside the tar archives is anonymized: - source image: `sources/source_0000000.png` - edited image: `targets/target_0000000.png` - mask image: `masks/mask_0000000.png` Each metadata row records which tar shard and which internal filename should be used for that sample. ## Metadata schema Each JSONL record contains: - `sample_id`: zero-based sample index - `source_id`: zero-based unique source-image index - `edit_type` - `instruction` - `better_data` - `bounding_box` - `bbox_reference_dimensions` - `source_archive` - `source_file` - `asset_archive` - `target_file` - `mask_file` ## Example metadata entry ```json { "sample_id": 0, "source_id": 0, "edit_type": "Add", "instruction": "添加一双发光的筷子", "better_data": false, "bounding_box": [357, 694, 902, 926], "bbox_reference_dimensions": {"width": 960, "height": 960}, "source_archive": "source_shards/source-00000-of-00245.tar", "source_file": "sources/source_0000000.png", "asset_archive": "asset_shards/asset-00000-of-00275.tar", "target_file": "targets/target_0000000.png", "mask_file": "masks/mask_0000000.png" } ``` ## Usage notes - This is the **training** release, not the manually annotated test benchmark. - The canonical sample order follows the original `Inter-Edit-train.json`. - Source images are deduplicated globally and indexed separately from sample indices. - Asset filenames are anonymized by design. ## Citation If you use this dataset, please cite: ```bibtex @inproceedings{liu2026interedit, title={Inter-Edit: First Benchmark for Interactive Instruction-Based Image Editing}, author={Liu, Delong and Hou, Haotian and Hou, Zhaohui and Huang, Zhiyuan and Han, Shihao and Zhan, Mingjie and Zhao, Zhicheng and Su, Fei}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year={2026} } ```

提供机构：

a1557811266

5,000+

优质数据集

54 个

任务类型

进入经典数据集