UniWorld-V1

Name: UniWorld-V1
Creator: maas
Published: 2025-12-12 17:40:55
License: 暂无描述

魔搭社区2025-12-12 更新2025-06-07 收录

下载链接：

https://modelscope.cn/datasets/PKU-YuanLab/UniWorld-V1

下载链接

链接失效反馈

官方服务：

资源简介：

<p style="color:red; font-size:25px"> The Geneval-style dataset is sourced from <a href="https://huggingface.co/datasets/BLIP3o/BLIP3o-60k" style="color:red">BLIP3o-60k</a>. </p> This dataset is presented in the paper: [UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation](https://huggingface.co/papers/2506.03147) More details can be found in [UniWorld-V1](https://github.com/PKU-YuanGroup/UniWorld-V1) ### Data preparation Download the data from [LanguageBind/UniWorld-V1](https://huggingface.co/datasets/LanguageBind/UniWorld-V1). The dataset consists of two parts: source images and annotation JSON files. Prepare a `data.txt` file in the following format: 1. The first column is the root path to the image. 2. The second column is the corresponding annotation JSON file. 3. The third column indicates whether to enable the region-weighting strategy. We recommend setting it to True for edited data and False for others. ``` data/BLIP3o-60k,json/blip3o_t2i_58859.json,false data/coco2017_caption_canny-236k,coco2017_canny_236574.json,false data/imgedit,json/imgedit/laion_add_part0_edit.json,true ``` We have prepared a `data.txt` file about ImgEdit for your reference. ``` data/imgedit/action/action,json/imgedit/pandam_action_edit.json,true data/imgedit/action/action_part2,json/imgedit/pandam2_action_edit.json,true data/imgedit/action/action_part3,json/imgedit/pandam3_action_edit.json,true data/imgedit/action/action_part4,json/imgedit/pandam4_action_edit.json,true data/imgedit/add/add_part0,json/imgedit/laion_add_part0_edit.json,true data/imgedit/add/add_part1,json/imgedit/laion_add_part1_edit.json,true data/imgedit/add/add_part4,json/imgedit/results_add_laion_part4_edit.json,true data/imgedit/add/add_part5,json/imgedit/results_add_laion_part5_edit.json,true data/imgedit/adjust/adjust_part0,json/imgedit/results_adjust_canny_laion_part0_edit.json,true data/imgedit/adjust/adjust_part2,json/imgedit/results_adjust_canny_laion_part2_edit.json,true data/imgedit/adjust/adjust_part3,json/imgedit/results_adjust_canny_laion_part3_edit.json,true data/imgedit/adjust/adjust_part4,json/imgedit/laion_adjust_canny_part4_edit.json,true data/imgedit/background/background_part0,json/imgedit/results_background_laion_part0_edit.json,true data/imgedit/background/background_part2,json/imgedit/results_background_laion_part2_edit.json,true data/imgedit/background/background_part3,json/imgedit/laion_background_part3_edit.json,true data/imgedit/background/background_part5,json/imgedit/laion_background_part5_edit.json,true data/imgedit/background/background_part7,json/imgedit/laion_background_part7_edit.json,true data/imgedit/compose/compose_part0,json/imgedit/results_compose_part0_edit.json,false data/imgedit/compose/compose_part2,json/imgedit/results_compose_part2_edit.json,false data/imgedit/compose/compose_part6,json/imgedit/results_compose_part6_fix_edit.json,false data/imgedit/refine_replace/refine_replace_part1,json/imgedit/results_extract_ref_part1_refimg_edit.json,true data/imgedit/remove/remove_part0,json/imgedit/laion_remove_part0_edit.json,true data/imgedit/remove/remove_part1,json/imgedit/results_remove_laion_part1_edit.json,true data/imgedit/remove/remove_part4,json/imgedit/results_remove_laion_part4_edit.json,true data/imgedit/remove/remove_part5,json/imgedit/results_remove_laion_part5_edit.json,true data/imgedit/replace/replace_part0,json/imgedit/laion_replace_part0_edit.json,true data/imgedit/replace/replace_part1,json/imgedit/laion_replace_part1_edit.json,true data/imgedit/replace/replace_part4,json/imgedit/results_replace_laion_part4_edit.json,true data/imgedit/replace/replace_part5,json/imgedit/results_replace_laion_part5_edit.json,true data/imgedit/transfer/transfer,json/imgedit/results_style_transfer_edit.json,false data/imgedit/transfer/transfer_part0,json/imgedit/results_style_transfer_part0_cap36472_edit.json,false ``` ### Data details Text-to-Image Generation - [BLIP3o-60k](https://huggingface.co/datasets/BLIP3o/BLIP3o-60k): We add text-to-image instructions to half of the data. [108 GB storage usage.] - [OSP1024-286k](https://huggingface.co/datasets/LanguageBind/UniWorld-V1/tree/main/data/OSP1024-286k): Sourced from internal data of the [Open-Sora Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan), with captions generated using [Qwen2-VL-72B](https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct). Images have an aspect ratio between 3:4 and 4:3, aesthetic score ≥ 6, and a short side ≥ 1024 pixels. [326 GB storage usage.] Image Editing - [imgedit-724k](https://huggingface.co/datasets/sysuyy/ImgEdit/tree/main): Data is filtered using GPT-4o, retaining approximately half. [2.8T storage usage.] - [OmniEdit-368k](https://huggingface.co/datasets/TIGER-Lab/OmniEdit-Filtered-1.2M): For image editing data, samples with edited regions smaller than 1/100 were filtered out; images have a short side ≥ 1024 pixels. [204 GB storage usage.] - [SEED-Data-Edit-Part1-Openimages-65k](https://huggingface.co/datasets/AILab-CVC/SEED-Data-Edit-Part1-Openimages): For image editing data, samples with edited regions smaller than 1/100 were filtered out. Images have a short side ≥ 1024 pixels. [10 GB storage usage.] - [SEED-Data-Edit-Part2-3-12k](https://huggingface.co/datasets/AILab-CVC/SEED-Data-Edit-Part2-3): For image editing data, samples with edited regions smaller than 1/100 were filtered out. Images have a short side ≥ 1024 pixels. [10 GB storage usage.] - [PromptfixData-18k](https://huggingface.co/datasets/yeates/PromptfixData): For image restoration data and some editing data, samples with edited regions smaller than 1/100 were filtered out. Images have a short side ≥ 1024 pixels. [9 GB storage usage.] - [StyleBooth-11k](https://huggingface.co/scepter-studio/stylebooth): For transfer style data, images have a short side ≥ 1024 pixels. [4 GB storage usage.] - [Ghibli-36k](https://huggingface.co/datasets/LanguageBind/UniWorld-V1/tree/main/data/Ghibli-36k): For transfer style data, images have a short side ≥ 1024 pixels. **Warning: This data has not been quality filtered.** [170 GB storage usage.] Extract & Try-on - [viton_hd-23k](https://huggingface.co/datasets/forgeml/viton_hd): Converted from the source data into an instruction dataset for product extraction. [1 GB storage usage.] - [deepfashion-27k](https://huggingface.co/datasets/lirus18/deepfashion): Converted from the source data into an instruction dataset for product extraction. [1 GB storage usage.] - [shop_product-23k](https://huggingface.co/datasets/LanguageBind/UniWorld-V1/tree/main/data/shop_product-23k): Sourced from internal data of the [Open-Sora Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan), focusing on product extraction and virtual try-on, with images having a short side ≥ 1024 pixels. [12 GB storage usage.] Image Perception - [coco2017_caption_canny-236k](https://huggingface.co/datasets/gebinhui/coco2017_caption_canny): img->canny & canny->img [25 GB storage usage.] - [coco2017_caption_depth-236k](https://huggingface.co/datasets/gebinhui/coco2017_caption_depth): img->depth & depth->img [8 GB storage usage.] - [coco2017_caption_hed-236k](https://huggingface.co/datasets/gebinhui/coco2017_caption_hed): img->hed & hed->img [13 GB storage usage.] - [coco2017_caption_mlsd-236k](https://huggingface.co/datasets/gebinhui/coco2017_caption_mlsd): img->mlsd & mlsd->img [ GB storage usage.] - [coco2017_caption_normal-236k](https://huggingface.co/datasets/gebinhui/coco2017_caption_normal): img->normal & normal->img [10 GB storage usage.] - [coco2017_caption_openpose-62k](https://huggingface.co/datasets/wangherr/coco2017_caption_openpose): img->pose & pose->img [2 GB storage usage.] - [coco2017_caption_sketch-236k](https://huggingface.co/datasets/wangherr/coco2017_caption_sketch): img->sketch & sketch->img [15 GB storage usage.] - [unsplash_canny-20k](https://huggingface.co/datasets/wtcherr/unsplash_10k_canny): img->canny & canny->img [2 GB storage usage.] - [open_pose-40k](https://huggingface.co/datasets/raulc0399/open_pose_controlnet): img->pose & pose->img [4 GB storage usage.] - [mscoco-controlnet-canny-less-colors-236k](https://huggingface.co/datasets/hazal-karakus/mscoco-controlnet-canny-less-colors): img->canny & canny->img [13 GB storage usage.] - [coco2017_seg_box-448k](https://huggingface.co/datasets/LanguageBind/UniWorld-V1/tree/main/data/coco2017_seg_box-448k): img->detection & img->segmentation (mask), instances with regions smaller than 1/100 were filtered out. We visualise masks on the original image as gt-image. [39 GB storage usage.] - [viton_hd-11k](https://huggingface.co/datasets/forgeml/viton_hd): img->pose [1 GB storage usage.] - [deepfashion-13k](https://huggingface.co/datasets/lirus18/deepfashion): img->pose [1 GB storage usage.]

<p style="color:red; font-size:25px">本Geneval风格数据集源自<a href="https://huggingface.co/datasets/BLIP3o/BLIP3o-60k" style="color:red">BLIP3o-60k</a>。</p> 本数据集收录于论文：[UniWorld：面向统一视觉理解与生成的高分辨率语义编码器](https://huggingface.co/papers/2506.03147) 更多细节可查阅[UniWorld-V1](https://github.com/PKU-YuanGroup/UniWorld-V1) ### 数据准备从[LanguageBind/UniWorld-V1](https://huggingface.co/datasets/LanguageBind/UniWorld-V1)下载数据集。本数据集包含两部分：原始图像与标注JSON文件。请按照以下格式编写`data.txt`文件： 1. 第一列为图像根路径； 2. 第二列为对应的标注JSON文件路径； 3. 第三列用于指定是否启用区域加权策略，我们建议对于编辑类数据设置为`True`，其余设置为`False`。 data/BLIP3o-60k,json/blip3o_t2i_58859.json,false data/coco2017_caption_canny-236k,coco2017_canny_236574.json,false data/imgedit,json/imgedit/laion_add_part0_edit.json,true 我们已准备好针对ImgEdit的`data.txt`文件供您参考。 data/imgedit/action/action,json/imgedit/pandam_action_edit.json,true data/imgedit/action/action_part2,json/imgedit/pandam2_action_edit.json,true data/imgedit/action/action_part3,json/imgedit/pandam3_action_edit.json,true data/imgedit/action/action_part4,json/imgedit/pandam4_action_edit.json,true data/imgedit/add/add_part0,json/imgedit/laion_add_part0_edit.json,true data/imgedit/add/add_part1,json/imgedit/laion_add_part1_edit.json,true data/imgedit/add/add_part4,json/imgedit/results_add_laion_part4_edit.json,true data/imgedit/add/add_part5,json/imgedit/results_add_laion_part5_edit.json,true data/imgedit/adjust/adjust_part0,json/imgedit/results_adjust_canny_laion_part0_edit.json,true data/imgedit/adjust/adjust_part2,json/imgedit/results_adjust_canny_laion_part2_edit.json,true data/imgedit/adjust/adjust_part3,json/imgedit/results_adjust_canny_laion_part3_edit.json,true data/imgedit/adjust/adjust_part4,json/imgedit/laion_adjust_canny_part4_edit.json,true data/imgedit/background/background_part0,json/imgedit/results_background_laion_part0_edit.json,true data/imgedit/background/background_part2,json/imgedit/results_background_laion_part2_edit.json,true data/imgedit/background/background_part3,json/imgedit/laion_background_part3_edit.json,true data/imgedit/background/background_part5,json/imgedit/laion_background_part5_edit.json,true data/imgedit/background/background_part7,json/imgedit/laion_background_part7_edit.json,true data/imgedit/compose/compose_part0,json/imgedit/results_compose_part0_edit.json,false data/imgedit/compose/compose_part2,json/imgedit/results_compose_part2_edit.json,false data/imgedit/compose/compose_part6,json/imgedit/results_compose_part6_fix_edit.json,false data/imgedit/refine_replace/refine_replace_part1,json/imgedit/results_extract_ref_part1_refimg_edit.json,true data/imgedit/remove/remove_part0,json/imgedit/laion_remove_part0_edit.json,true data/imgedit/remove/remove_part1,json/imgedit/results_remove_laion_part1_edit.json,true data/imgedit/remove/remove_part4,json/imgedit/results_remove_laion_part4_edit.json,true data/imgedit/remove/remove_part5,json/imgedit/results_remove_laion_part5_edit.json,true data/imgedit/replace/replace_part0,json/imgedit/laion_replace_part0_edit.json,true data/imgedit/replace/replace_part1,json/imgedit/laion_replace_part1_edit.json,true data/imgedit/replace/replace_part4,json/imgedit/results_replace_laion_part4_edit.json,true data/imgedit/replace/replace_part5,json/imgedit/results_replace_laion_part5_edit.json,true data/imgedit/transfer/transfer,json/imgedit/results_style_transfer_edit.json,false data/imgedit/transfer/transfer_part0,json/imgedit/results_style_transfer_part0_cap36472_edit.json,false ### 数据详情 ### 文本到图像生成 - [BLIP3o-60k](https://huggingface.co/datasets/BLIP3o/BLIP3o-60k)：我们为半数数据添加了文本到图像生成指令。[占用存储空间：108 GB] - [OSP1024-286k](https://huggingface.co/datasets/LanguageBind/UniWorld-V1/tree/main/data/OSP1024-286k)：数据源自[Open-Sora计划（Open-Sora Plan）](https://github.com/PKU-YuanGroup/Open-Sora-Plan)的内部数据，标注字幕由[Qwen2-VL-72B](https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct)生成。图像宽高比介于3:4至4:3之间，美学评分≥6，短边分辨率≥1024像素。[占用存储空间：326 GB] ### 图像编辑 - [imgedit-724k](https://huggingface.co/datasets/sysuyy/ImgEdit/tree/main)：通过GPT-4o进行数据过滤，仅保留约半数样本。[占用存储空间：2.8 TB] - [OmniEdit-368k](https://huggingface.co/datasets/TIGER-Lab/OmniEdit-Filtered-1.2M)：针对图像编辑数据，过滤掉编辑区域占比小于1/100的样本；图像短边分辨率≥1024像素。[占用存储空间：204 GB] - [SEED-Data-Edit-Part1-Openimages-65k](https://huggingface.co/datasets/AILab-CVC/SEED-Data-Edit-Part1-Openimages)：针对图像编辑数据，过滤掉编辑区域占比小于1/100的样本，图像短边分辨率≥1024像素。[占用存储空间：10 GB] - [SEED-Data-Edit-Part2-3-12k](https://huggingface.co/datasets/AILab-CVC/SEED-Data-Edit-Part2-3)：针对图像编辑数据，过滤掉编辑区域占比小于1/100的样本，图像短边分辨率≥1024像素。[占用存储空间：10 GB] - [PromptfixData-18k](https://huggingface.co/datasets/yeates/PromptfixData)：针对图像修复与部分编辑类数据，过滤掉编辑区域占比小于1/100的样本，图像短边分辨率≥1024像素。[占用存储空间：9 GB] - [StyleBooth-11k](https://huggingface.co/scepter-studio/stylebooth)：针对风格迁移数据，图像短边分辨率≥1024像素。[占用存储空间：4 GB] - [Ghibli-36k](https://huggingface.co/datasets/LanguageBind/UniWorld-V1/tree/main/data/Ghibli-36k)：针对风格迁移数据，图像短边分辨率≥1024像素。**警告：该数据集未经过质量过滤。**[占用存储空间：170 GB] ### 提取与虚拟试穿 - [viton_hd-23k](https://huggingface.co/datasets/forgeml/viton_hd)：由原始数据转换为面向商品提取的指令数据集。[占用存储空间：1 GB] - [deepfashion-27k](https://huggingface.co/datasets/lirus18/deepfashion)：由原始数据转换为面向商品提取的指令数据集。[占用存储空间：1 GB] - [shop_product-23k](https://huggingface.co/datasets/LanguageBind/UniWorld-V1/tree/main/data/shop_product-23k)：数据源自[Open-Sora计划（Open-Sora Plan）](https://github.com/PKU-YuanGroup/Open-Sora-Plan)的内部数据，聚焦商品提取与虚拟试穿任务，图像短边分辨率≥1024像素。[占用存储空间：12 GB] ### 图像感知 - [coco2017_caption_canny-236k](https://huggingface.co/datasets/gebinhui/coco2017_caption_canny)：支持图像→Canny边缘检测图与Canny边缘检测图→图像的双向转换。[占用存储空间：25 GB] - [coco2017_caption_depth-236k](https://huggingface.co/datasets/gebinhui/coco2017_caption_depth)：支持图像→深度图与深度图→图像的双向转换。[占用存储空间：8 GB] - [coco2017_caption_hed-236k](https://huggingface.co/datasets/gebinhui/coco2017_caption_hed)：支持图像→HED边缘图与HED边缘图→图像的双向转换。[占用存储空间：13 GB] - [coco2017_caption_mlsd-236k](https://huggingface.co/datasets/gebinhui/coco2017_caption_mlsd)：支持图像→MLSD线检测图与MLSD线检测图→图像的双向转换。[占用存储空间：无标注] - [coco2017_caption_normal-236k](https://huggingface.co/datasets/gebinhui/coco2017_caption_normal)：支持图像→法向图与法向图→图像的双向转换。[占用存储空间：10 GB] - [coco2017_caption_openpose-62k](https://huggingface.co/datasets/wangherr/coco2017_caption_openpose)：支持图像→OpenPose姿态图与姿态图→图像的双向转换。[占用存储空间：2 GB] - [coco2017_caption_sketch-236k](https://huggingface.co/datasets/wangherr/coco2017_caption_sketch)：支持图像→素描图与素描图→图像的双向转换。[占用存储空间：15 GB] - [unsplash_canny-20k](https://huggingface.co/datasets/wtcherr/unsplash_10k_canny)：支持图像→Canny边缘检测图与Canny边缘检测图→图像的双向转换。[占用存储空间：2 GB] - [open_pose-40k](https://huggingface.co/datasets/raulc0399/open_pose_controlnet)：支持图像→姿态图与姿态图→图像的双向转换。[占用存储空间：4 GB] - [mscoco-controlnet-canny-less-colors-236k](https://huggingface.co/datasets/hazal-karakus/mscoco-controlnet-canny-less-colors)：支持低色彩版本的图像→Canny边缘检测图与Canny边缘检测图→图像的双向转换。[占用存储空间：13 GB] - [coco2017_seg_box-448k](https://huggingface.co/datasets/LanguageBind/UniWorld-V1/tree/main/data/coco2017_seg_box-448k)：支持图像→目标检测结果与图像→语义分割掩码的转换，过滤掉区域占比小于1/100的实例。我们将掩码叠加至原始图像上作为真值图像（gt-image）。[占用存储空间：39 GB] - [viton_hd-11k](https://huggingface.co/datasets/forgeml/viton_hd)：面向图像→姿态图转换的数据集。[占用存储空间：1 GB] - [deepfashion-13k](https://huggingface.co/datasets/lirus18/deepfashion)：面向图像→姿态图转换的数据集。[占用存储空间：1 GB]

提供机构：

maas

创建时间：

2025-06-05

5,000+

优质数据集

54 个

任务类型

进入经典数据集