mulan-dataset/v1.0

Name: mulan-dataset/v1.0
Creator: mulan-dataset
Published: 2024-04-26 13:53:51
License: 暂无描述

Hugging Face2024-04-26 更新2024-04-19 收录

下载链接：

https://hf-mirror.com/datasets/mulan-dataset/v1.0

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-sa-4.0 task_categories: - text-to-image language: - en tags: - decomposition - RGBA - multi-layer - COCO - LVIS - LAION pretty_name: MuLAn size_categories: - 10K<n<100K --- # MuLAn: : A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation MuLAn is a novel dataset comprising over 44K MUlti-Layer ANnotations of RGB images as multilayer, instance-wise RGBA decompositions, and over 100K instance images. It is composed of MuLAn-COCO and MuLAn-LAION sub-datasets, which contain a variety of image decompositions in terms of style, composition and complexity. With MuLAn, we provide the first photorealistic resource providing instance decomposition and occlusion information for high quality images, opening up new avenues for text-to-image generative AI research. With this, we aim to encourage the development of novel generation and editing technology, in particular layer-wise solutions. # FAQ: 1) The LVIS dataset is equivalent to COCO 2017 dataset so please make sure you have the correct version of the dataset. Furthermore, the images come from both the traning and validation splits so check both subfolders of the original dataset. 2) The LAION dataset is based on LAION Aesthetic v2 6.5+ and the filename linking will only work with it if it was downloaded by the img2dataset library as the row index in the parque file is the filename. Furthermore, given the [LAION situation](https://laion.ai/notes/laion-maintanence/) we will not be releasing the links to the original images until their review has been finished and we concluded as well that none of the original images violate GDPR. # Dataset format In order to respect the base datasets' LICENCEs we have releasead MuLAn in annotation format. Each image is associated with a pickle file structured as below. We have also released a small script that given a csv with the base image/annotation pairs will automatically reconstruct the decomposed images and save the captioning and path metadata in a separate csv. ``` "captioning": { "llava": LLaVa model details "blip2": BLIP 2 model details "clip": CLIP model details } "background": { "llava": Detailed background LLaVa caption "blip2": COCO style BLIP 2 background caption chosen by CLIP "original_image_mask": Original image background content mask "inpainted_delta": Additive inpainted background content } "image": { "llava": Detailed original image LLaVa caption "blip2": COCO style BLIP 2 original image caption chosen by CLIP. } "instances": { "blip2": COCO style BLIP 2 instance caption chosen by CLIP. "original_image_mask": Original image instance content mask "inpainted_delta": Additive inpainted instance content "instance_alpha": Alpha layer of the inpainted instance } ``` # Dataset decomposition First you need to make sure you have the `unrar` package for ubuntu. You can install it by using the following command. ``` sudo apt-get install rar unrar ``` Then the command below will extract the dataset. ``` unrar x -e mulan.part001.rar ``` Afterwards create the required conda environment ``` conda env create --name mulan --file=mulan_env.yml conda activate mulan ``` Then manually create a csv with two column `image` and `annotation` similarly with the toy example below. ***Please pay attention to COCO dataset*** specifically as some base images are from the `train2017` subset some are from the `val2017` one. ``` image, annotation <path_to_image>/<image_id>.jpg, <path_to_annotation>/<image_id>.p.zl <path_to_image>/<image_id>.jpg, <path_to_annotation>/<image_id>.p.zl <path_to_image>/<image_id>.jpg, <path_to_annotation>/<image_id>.p.zl ``` We advise to create to separate csvs, one for the COCO dataset and one for the LAION Aesthetic V2 6.5 in order to guarantee no image id clashes. The provided script can then be used to reconstruct the RGBA stacks. Please be advised that we are using joblib to paralelise the decomposition so your CPU and I/O might be heavily impacted during the script running. Be careful of the following: - `output_path` needs to be without the trailing `/` - `number_of_processes` if unspecified will default to `2 * number of cores` ``` python3 dataset_decomposition.py \ --csv_path='/path/to/images/and/annotations/file.csv' \ --output_path='/path/to/where/images/will/be/decomposed' \ --number_of_processes=<<number of cores>> ``` In the `/path/to/where/images/will/be/decomposed`, the script will generate multiple images per original RGB image following the structure below as well as a `meta_data.csv` file. The csv will have three columns inside `paths` of the individual layers, `blip2` caption of the layer and `llava` caption of the same layer. The `llava` caption will be `N/A` for instances as we have not generate those. ``` <<image_id>>-layer_0.png - Background RGB Image <<image_id>>-layer_x.png - Instance X RGBA Image ``` # Examples ## COCO ![COCO Example 1](static/COCO7.png) ![COCO Example 2](static/COCO2.png) ## LAION Aesthetic v2 6.5 ![LAION Example 1](static/LAION1.png) ![LAION Example 2](static/LAION7.png) # Possible applications ## Instance Addition through MuLAn finetuned InstructPix2Pix ![alt text](static/mulan_ip2p.webp) ## Instance Generation through MuLAn finetuned StableDiffusion v1.5 ![alt text](static/rgba-generation.webp) # Reference Please do not forget to cite our work if you are using this dataset in your research. Corresponding author is Petru-Daniel Tudosiu (petru.daniel.tudosiu@huawei.com). ``` @article{tudosiu2024mulan, title={MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation}, author={Petru-Daniel Tudosiu and Yongxin Yang and Shifeng Zhang and Fei Chen and Steven McDonagh and Gerasimos Lampouras and Ignacio Iacobacci and Sarah Parisot}, year={2024}, eprint={2404.02790}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```

--- 许可证：知识共享署名-非商业性使用-相同方式共享4.0（CC BY-NC-SA 4.0）任务类别： - 文本到图像生成语言： - 英语标签： - 图像分解 - RGBA - 多层 - COCO（通用物体检测数据集，Common Objects in Context） - LVIS（大词汇量实例分割数据集） - LAION 数据集简称：MuLAn 数据规模： - 1万<n<10万 --- # MuLAn：用于可控文本到图像生成的多层标注数据集 MuLAn是一款全新的数据集，包含超过4.4万份针对RGB图像的多层、逐实例RGBA分解标注，以及超过10万张实例图像。该数据集由MuLAn-COCO与MuLAn-LAION两个子数据集组成，二者涵盖了风格、构图与复杂度各异的图像分解任务。依托MuLAn，我们推出了首个面向高质量图像的、提供实例分解与遮挡信息的真实感资源，为文本到图像生成式AI研究开辟了全新方向。本数据集旨在推动新型生成与编辑技术的发展，尤其是面向分层处理的解决方案。 # 常见问题（FAQ） 1) LVIS数据集与COCO 2017数据集等效，请确保您使用的是正确版本的数据集。此外，该数据集的图像来源于训练集与验证集划分，请同时检查原始数据集的两个子文件夹。 2) LAION数据集基于LAION Aesthetic v2 6.5+版本构建，仅当使用img2dataset库下载时，文件名关联逻辑方可生效，因为parquet文件中的行索引即为文件名。此外，鉴于[LAION相关情况](https://laion.ai/notes/laion-maintanence/)，在完成图像审核并确认所有原始图像均未违反GDPR（通用数据保护条例）之前，我们将不会发布原始图像的链接。 # 数据集格式为尊重基础数据集的版权协议，我们仅以标注格式发布了MuLAn数据集。每张图像均对应一个遵循以下结构的Pickle文件。我们还发布了一款小型脚本，当输入包含基础图像与标注对应关系的CSV文件时，该脚本可自动重建分解后的图像，并将标注文本与路径元数据保存至单独的CSV文件中。 "标注文本": { "llava": LLaVA模型相关细节 "blip2": BLIP-2模型相关细节 "clip": CLIP模型相关细节 } "背景": { "llava": LLaVA生成的详细背景标注文本 "blip2": 由CLIP筛选的、符合COCO数据集风格的BLIP-2背景标注文本 "original_image_mask": 原始图像背景内容掩码 "inpainted_delta": 叠加式修复后的背景内容 } "图像": { "llava": LLaVA生成的原始图像详细标注文本 "blip2": 由CLIP筛选的、符合COCO数据集风格的BLIP-2原始图像标注文本。 } "实例": { "blip2": 由CLIP筛选的、符合COCO数据集风格的BLIP-2实例标注文本 "original_image_mask": 原始图像实例内容掩码 "inpainted_delta": 叠加式修复后的实例内容 "instance_alpha": 修复后实例的Alpha通道层 } # 数据集分解流程首先，请确保您的Ubuntu系统已安装`unrar`工具，可通过以下命令完成安装： sudo apt-get install rar unrar 随后执行以下命令即可解压数据集： unrar x -e mulan.part001.rar 接着创建所需的Conda环境： conda env create --name mulan --file=mulan_env.yml conda activate mulan 随后手动创建一个包含`image`与`annotation`两列的CSV文件，格式可参考以下示例。***请特别注意COCO数据集***：部分基础图像来源于`train2017`子集，另一部分则来源于`val2017`子集。 image, annotation <图像路径>/<图像ID>.jpg, <标注路径>/<图像ID>.p.zl <图像路径>/<图像ID>.jpg, <标注路径>/<图像ID>.p.zl <图像路径>/<图像ID>.jpg, <标注路径>/<图像ID>.p.zl 我们建议为COCO数据集与LAION Aesthetic V2 6.5分别创建CSV文件，以避免图像ID冲突。随后可使用提供的脚本重建RGBA图像栈。请注意，我们使用Joblib实现了解码过程的并行化，因此脚本运行期间可能会对CPU与I/O资源造成较大占用。请注意以下事项： - `output_path` 末尾不得包含`/` - 若未指定`number_of_processes`，则默认值为`2 * CPU核心数` python3 dataset_decomposition.py --csv_path='/图像与标注文件的CSV路径' --output_path='/分解后图像的存储路径' --number_of_processes=<<CPU核心数>> 在上述`分解后图像的存储路径`中，脚本将为每张原始RGB图像生成多张图像，遵循以下命名结构，同时还会生成一个`meta_data.csv`文件。该CSV文件包含三列：各分层的路径、该分层的BLIP-2标注文本，以及该分层的LLaVA标注文本。对于实例分层，LLaVA标注文本将显示为`N/A`，因为我们尚未生成对应的标注。 <<图像ID>>-layer_0.png - 背景RGB图像 <<图像ID>>-layer_x.png - 第X个实例的RGBA图像 # 示例 ## COCO数据集示例 ![COCO示例1](static/COCO7.png) ![COCO示例2](static/COCO2.png) ## LAION Aesthetic v2 6.5 数据集示例 ![LAION示例1](static/LAION1.png) ![LAION示例2](static/LAION7.png) # 可能的应用场景 ## 基于MuLAn微调的InstructPix2Pix实现实例添加 ![alt text](static/mulan_ip2p.webp) ## 基于MuLAn微调的StableDiffusion v1.5实现实例生成 ![alt text](static/rgba-generation.webp) # 引用说明若您在研究中使用本数据集，请务必引用我们的相关工作。通讯作者为Petru-Daniel Tudosiu（邮箱：petru.daniel.tudosiu@huawei.com）。 bibtex @article{tudosiu2024mulan, title={MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation}, author={Petru-Daniel Tudosiu and Yongxin Yang and Shifeng Zhang and Fei Chen and Steven McDonagh and Gerasimos Lampouras and Ignacio Iacobacci and Sarah Parisot}, year={2024}, eprint={2404.02790}, archivePrefix={arXiv}, primaryClass={cs.CV} }

提供机构：

mulan-dataset

原始信息汇总

数据集概述

数据集名称

MuLAn: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation

数据集描述

MuLAn 是一个包含超过44K多层注释的RGB图像数据集，以及超过100K实例图像。该数据集由MuLAn-COCO和MuLAn-LAION子数据集组成，提供了多种风格、构图和复杂度的图像分解。这是首个提供高质量图像实例分解和遮挡信息的现实资源，为文本到图像生成AI研究开辟了新途径。

数据集组成

MuLAn-COCO
MuLAn-LAION

数据集特点

多层注释（RGBA分解）
实例级分解
包含遮挡信息

数据集应用

促进新型生成和编辑技术的发展，特别是层级解决方案。

数据集格式

每个图像关联一个pickle文件，结构如下：

"captioning": { "llava": LLaVa model details "blip2": BLIP 2 model details "clip": CLIP model details } "background": { "llava": Detailed background LLaVa caption "blip2": COCO style BLIP 2 background caption chosen by CLIP "original_image_mask": Original image background content mask "inpainted_delta": Additive inpainted background content } "image": { "llava": Detailed original image LLaVa caption "blip2": COCO style BLIP 2 original image caption chosen by CLIP. } "instances": { "blip2": COCO style BLIP 2 instance caption chosen by CLIP. "original_image_mask": Original image instance content mask "inpainted_delta": Additive inpainted instance content "instance_alpha": Alpha layer of the inpainted instance }

数据集使用注意事项

确保使用正确的COCO和LAION数据集版本。
图像来自原始数据集的训练和验证分割。
由于LAION数据集的审查，原始图像链接暂不提供。

数据集许可证

cc-by-nc-sa-4.0

搜集汇总

数据集介绍

构建方式

在计算机视觉与生成式人工智能领域，数据集的构建质量直接影响模型性能。MuLAn数据集通过整合COCO 2017与LAION Aesthetic v2 6.5+两大权威图像资源，构建了超过44,000个多层注释。每个注释均以实例级RGBA分解形式呈现，涵盖背景与前景对象的透明通道信息。构建过程中，团队采用BLIP 2与LLaVa等先进视觉语言模型生成图像描述，并利用图像修复技术分离各层内容，最终以pickle文件格式存储结构化注释，确保了数据的一致性与可复用性。

使用方法

使用MuLAn数据集前，需先下载原始COCO与LAION图像，并按照提供的链接文件匹配对应注释。通过解压压缩包并配置专用Conda环境，用户可运行配套脚本进行图像分解重建。脚本支持多进程并行处理，能够将pickle注释转换为背景图层与多个实例RGBA图层，同时生成包含路径与描述信息的元数据文件。研究人员可借此构建训练管道，用于微调扩散模型或开发层感知生成算法，实现精准的对象添加、移除或风格迁移等高级编辑功能。

背景与挑战

背景概述

在文本到图像生成技术迅速发展的背景下，可控生成成为研究焦点，然而现有数据集往往缺乏对图像内部层次结构的精细标注。MuLAn数据集于2024年由华为等机构的研究团队创建，旨在通过提供多层实例级RGBA分解标注，为可控文本到图像生成开辟新路径。该数据集整合了COCO和LAION子集，涵盖超过44,000个多层标注与10万多个实例图像，首次为高质量图像提供了实例分解与遮挡信息，推动了生成式人工智能在分层编辑与合成方向的研究进展。

当前挑战

MuLAn数据集致力于解决可控文本到图像生成中实例级分解与遮挡建模的挑战，传统方法难以精确分离图像中的重叠对象并保持视觉一致性。在构建过程中，团队面临多重困难：一是需协调不同源数据集（如COCO与LAION）的许可协议与格式差异，确保标注合法性与一致性；二是大规模图像的多层标注涉及复杂的掩码生成与修复技术，对计算资源与算法精度要求极高；三是数据重构流程依赖并行处理，易受硬件I/O性能制约，增加了部署与使用的复杂性。

常用场景

经典使用场景

在可控文本到图像生成领域，MuLAn数据集以其多层实例级RGBA分解标注，为生成模型提供了精细的结构化监督信号。该数据集常用于训练或微调扩散模型，如Stable Diffusion，以实现对图像中特定实例的独立生成与编辑。通过将图像分解为背景层与多个透明实例层，研究者能够构建层间遮挡关系的先验知识，进而推动生成模型在复杂场景合成中的可控性提升。

解决学术问题

MuLAn数据集有效应对了文本到图像生成中实例级可控性不足的学术挑战。传统方法往往难以精确分离图像中的重叠对象，而该数据集提供的实例掩码与RGBA层直接揭示了遮挡与层次结构，为模型学习空间推理提供了关键数据支撑。其意义在于突破了生成模型仅依赖全局文本引导的局限，促进了层感知生成技术的发展，为计算机视觉中组合式场景理解奠定了数据基础。

实际应用

在实际应用中，MuLAn数据集赋能了图像编辑与内容创作工具的智能化升级。基于其多层标注数据，开发者能够训练出支持实例添加、移除或替换的编辑模型，例如微调InstructPix2Pix实现交互式对象操作。此外，该数据集还可用于游戏资产生成、广告设计等场景，通过分层控制快速合成符合要求的视觉内容，显著提升创意产业的工作效率与灵活性。

数据集最近研究