BM-6M-Demo

Name: BM-6M-Demo
Creator: maas
Published: 2026-01-06 16:34:19
License: 暂无描述

魔搭社区2026-01-06 更新2025-05-31 收录

下载链接：

https://modelscope.cn/datasets/ByteDance-Seed/BM-6M-Demo

下载链接

链接失效反馈

官方服务：

资源简介：

[![Paper](https://img.shields.io/badge/%20arXiv-Paper-red)](https://arxiv.org/abs/2506.03107) [![Prohect-Page](https://img.shields.io/badge/%20Project-Website-blue)](https://boese0601.github.io/bytemorph/) [![Benchmaek](https://img.shields.io/badge/🤗%20Huggingface-Benchmark-yellow)](https://huggingface.co/datasets/ByteDance-Seed/BM-Bench) [![Dataset-Demo](https://img.shields.io/badge/🤗%20Huggingface-Dataset_Demo-yellow)](https://huggingface.co/datasets/ByteDance-Seed/BM-6M-Demo) [![Dataset](https://img.shields.io/badge/🤗%20Huggingface-Dataset-yellow)](https://huggingface.co/datasets/ByteDance-Seed/BM-6M) [![Gradio-Demo](https://img.shields.io/badge/🤗%20Huggingface-Gradio_Demo-yellow)](https://huggingface.co/spaces/Boese0601/ByteMorpher-Demo) [![Checkpoint](https://img.shields.io/badge/🤗%20Huggingface-Checkpoint-yellow)](https://huggingface.co/ByteDance-Seed/BM-Model) [![Code](https://img.shields.io/badge/%20Github-Code-blue)](https://github.com/ByteDance-Seed/BM-code) # Dataset Card for ByteMorph-6M-Demo The task of editing images to reflect non-rigid motions, such as changes in camera viewpoint, object deformation, human articulation, or complex interactions, represents a significant yet underexplored frontier in computer vision. Current methodologies and datasets often concentrate on static imagery or rigid transformations, thus limiting their applicability to expressive edits involving dynamic movement. To bridge this gap, we present ByteMorph, a substantial benchmark specifically created for instruction-based image editing focused on non-rigid motions. This dataset card contains the example training data subset and instructions for ByteMorph-6M. For full training data, please visit [this repo](https://huggingface.co/datasets/ByteDance-Seed/BM-6M). ## Dataset Details Original videos are generated by [Seaweed](https://seaweed.video/) and sampled into frames as source-target image editing pairs. These frames are further filtered and captioned by VLM. ## Intended use Primary intended uses: The primary use of ByteMorph is research on text-to-image and instruction-based image editing. Primary intended users: The model's primary intended users are researchers and hobbyists in computer vision, image generation, image processing, and AIGC. ## Dataset Structure ``` { "image_id": "[video_name]_frame_[i]_[j]", # sampled pair image name from the generated video, i and j denote the sampled image index. If there's no index, the frames are sampled from the start and end of the video. "src_img": "...", # source image "tgt_img": "...", # target image after editing "edit_prompt": "The camera angle shifts to a closer view, more people appear in the frame, and the individuals are now engaged in a discussion or negotiation.", # VLM caption of the editing "edit_prompt_rewrite_instruction": "Zoom in the camera angle, add more people to the frame, and adjust the individuals' actions to show them engaged in a discussion or negotiation.", # Rewrite the VLM caption as an editing instruction "src_img_caption": "Several individuals are present, including three people wearing camouflage uniforms, blue helmets, and blue vests labeled "UN." ... ", # the caption of the source image "tgt_img_caption": "Several individuals are gathered in an outdoor setting. Two people wearing blue helmets and blue vests with "UN" written on them are engaged in a discussion. ... ", # the caption of the target image } ``` ### How to use ByteMorph-6M-Demo Please preprocess this dataset demo and visualize the images with the following script. ```python import os import json from datasets import load_dataset from PIL import Image from io import BytesIO from tqdm import tqdm # Load dataset ds = load_dataset("ByteDance-Seed/BM-6M-Demo", split="train") # Define output root directory output_root = "./train_dataset/" for example in tqdm(ds): image_id = example["image_id"] subfolder = output_root os.makedirs(subfolder, exist_ok=True) # Reconstruct source and target images source_img = example["src_img"] target_img = example["tgt_img"] # Concatenate side by side w, h = source_img.size combined = Image.new("RGB", (w * 2, h)) combined.paste(source_img, (0, 0)) combined.paste(target_img, (w, 0)) # Save combined image out_img_path = os.path.join(subfolder, f"{image_id}.png") combined.save(out_img_path) # Save JSON file out_json_path = os.path.join(subfolder, f"{image_id}.json") json_content = { "edit": example["edit_prompt"], "edit_rewrite": example["edit_prompt_rewrite_instruction"], "input": example["src_img_caption"], "output": example["tgt_img_caption"], } with open(out_json_path, "w") as f: json.dump(json_content, f, indent=2) ``` ## Bibtex citation ```bibtex @article{chang2025bytemorph, title={ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions}, author={Chang, Di and Cao, Mingdeng and Shi, Yichun and Liu, Bo and Cai, Shengqu and Zhou, Shijie and Huang, Weilin and Wetzstein, Gordon and Soleymani, Mohammad and Wang, Peng}, journal={arXiv preprint arXiv:2506.03107}, year={2025} } ```

[![Paper](https://img.shields.io/badge/%20arXiv-Paper-red)](https://arxiv.org/abs/2506.03107) [![Prohect-Page](https://img.shields.io/badge/%20Project-Website-blue)](https://boese0601.github.io/bytemorph/) [![Benchmaek](https://img.shields.io/badge/🤗%20Huggingface-Benchmark-yellow)](https://huggingface.co/datasets/ByteDance-Seed/BM-Bench) [![Dataset-Demo](https://img.shields.io/badge/🤗%20Huggingface-Dataset_Demo-yellow)](https://huggingface.co/datasets/ByteDance-Seed/BM-6M-Demo) [![Dataset](https://img.shields.io/badge/🤗%20Huggingface-Dataset-yellow)](https://huggingface.co/datasets/ByteDance-Seed/BM-6M) [![Gradio-Demo](https://img.shields.io/badge/🤗%20Huggingface-Gradio_Demo-yellow)](https://huggingface.co/spaces/Boese0601/ByteMorpher-Demo) [![Checkpoint](https://img.shields.io/badge/🤗%20Huggingface-Checkpoint-yellow)](https://huggingface.co/ByteDance-Seed/BM-Model) [![Code](https://img.shields.io/badge/%20Github-Code-blue)](https://github.com/ByteDance-Seed/BM-code) # ByteMorph-6M-Demo 数据集卡片旨在通过图像编辑呈现非刚体运动（如相机视角变化、物体形变、人体关节运动或复杂交互）的任务，是计算机视觉领域中一项极具研究价值却尚未得到充分探索的前沿方向。现有研究方法与数据集多聚焦于静态图像或刚体变换，难以适用于包含动态运动的富有表现力的图像编辑任务。为填补这一研究空白，我们推出了ByteMorph——一款专为非刚体运动场景下的指令式图像编辑任务打造的大规模基准数据集。本数据集卡片包含ByteMorph-6M的示例训练数据子集与相关说明。如需获取完整训练数据，请访问[此仓库](https://huggingface.co/datasets/ByteDance-Seed/BM-6M)。 ## 数据集详情原始视频由[Seaweed](https://seaweed.video/)生成，并抽帧为源图像-目标图像编辑配对样本。后续通过视觉语言模型（VLM）对这些帧进行筛选与字幕标注。 ## 预期用途核心预期用途：ByteMorph的核心用途为面向文本到图像生成与指令式图像编辑的研究。核心目标用户：本数据集的核心目标用户为计算机视觉、图像生成、图像处理以及生成式人工智能（AIGC）领域的研究人员与爱好者。 ## 数据集结构 { "image_id": "[video_name]_frame_[i]_[j]", # 生成视频抽帧得到的配对图像名称，i与j为抽帧的图像索引。若未标注索引，则表示从视频首尾抽帧得到的样本。 "src_img": "...", # 源图像 "tgt_img": "...", # 编辑后的目标图像 "edit_prompt": "镜头视角拉近，画面中出现更多人物，且这些人物正在进行讨论或协商。", # 编辑任务的视觉语言模型标注描述 "edit_prompt_rewrite_instruction": "拉近镜头视角，向画面中添加更多人物，并调整人物动作使其处于讨论或协商状态。", # 将视觉语言模型的标注描述改写为编辑指令 "src_img_caption": "画面中有多名人物，其中三人身着迷彩服、头戴蓝色头盔并印有“UN”字样的蓝色背心……", # 源图像的标注描述 "tgt_img_caption": "多名人物聚集在户外场景中，两名头戴蓝色头盔、身着印有“UN”字样蓝色背心的人物正在进行讨论。……", # 目标图像的标注描述 } ### ByteMorph-6M-Demo 使用方法请使用以下脚本对本数据集示例进行预处理并可视化图像： python import os import json from datasets import load_dataset from PIL import Image from io import BytesIO from tqdm import tqdm # 加载数据集 ds = load_dataset("ByteDance-Seed/BM-6M-Demo", split="train") # 定义输出根目录 output_root = "./train_dataset/" for example in tqdm(ds): image_id = example["image_id"] subfolder = output_root os.makedirs(subfolder, exist_ok=True) # 重构源图像与目标图像 source_img = example["src_img"] target_img = example["tgt_img"] # 将两张图像横向拼接 w, h = source_img.size combined = Image.new("RGB", (w * 2, h)) combined.paste(source_img, (0, 0)) combined.paste(target_img, (w, 0)) # 保存拼接后的图像 out_img_path = os.path.join(subfolder, f"{image_id}.png") combined.save(out_img_path) # 保存JSON文件 out_json_path = os.path.join(subfolder, f"{image_id}.json") json_content = { "edit": example["edit_prompt"], "edit_rewrite": example["edit_prompt_rewrite_instruction"], "input": example["src_img_caption"], "output": example["tgt_img_caption"], } with open(out_json_path, "w") as f: json.dump(json_content, f, indent=2) ## BibTeX 引用 bibtex @article{chang2025bytemorph, title={ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions}, author={Chang, Di and Cao, Mingdeng and Shi, Yichun and Liu, Bo and Cai, Shengqu and Zhou, Shijie and Huang, Weilin and Wetzstein, Gordon and Soleymani, Mohammad and Wang, Peng}, journal={arXiv preprint arXiv:2506.03107}, year={2025} }

提供机构：

maas

创建时间：

2025-05-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集