资源简介:
---
license: cc-by-nc-sa-4.0
task_categories:
- text-to-image
language:
- en
tags:
- decomposition
- RGBA
- multi-layer
- COCO
- LVIS
- LAION
pretty_name: MuLAn
size_categories:
- 10K<n<100K
---
# MuLAn: : A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation
MuLAn is a novel dataset comprising over 44K MUlti-Layer ANnotations of RGB images as multilayer, instance-wise RGBA decompositions, and over 100K instance images. It is composed of MuLAn-COCO and MuLAn-LAION sub-datasets, which contain a variety of image decompositions in terms of style, composition and complexity. With MuLAn, we provide the first photorealistic resource providing instance decomposition and occlusion information for high quality images, opening up new avenues for text-to-image generative AI research. With this, we aim to encourage the development of novel generation and editing technology, in particular layer-wise solutions.
# FAQ:
1) The LVIS dataset is equivalent to COCO 2017 dataset so please make sure you have the correct version of the dataset. Furthermore, the images come from both the traning and validation splits so check both subfolders of the original dataset.
2) The LAION dataset is based on LAION Aesthetic v2 6.5+ and the filename linking will only work with it if it was downloaded by the img2dataset library as the row index in the parque file is the filename. Furthermore, given the [LAION situation](https://laion.ai/notes/laion-maintanence/) we will not be releasing the links to the original images until their review has been finished and we concluded as well that none of the original images violate GDPR.
# Dataset format
In order to respect the base datasets' LICENCEs we have releasead MuLAn in annotation format.
Each image is associated with a pickle file structured as below. We have also released a small script that given a csv with the base image/annotation pairs will automatically reconstruct the decomposed images and save the captioning and path metadata in a separate csv.
```
"captioning": {
"llava": LLaVa model details
"blip2": BLIP 2 model details
"clip": CLIP model details
}
"background": {
"llava": Detailed background LLaVa caption
"blip2": COCO style BLIP 2 background caption chosen by CLIP
"original_image_mask": Original image background content mask
"inpainted_delta": Additive inpainted background content
}
"image": {
"llava": Detailed original image LLaVa caption
"blip2": COCO style BLIP 2 original image caption chosen by CLIP.
}
"instances": {
"blip2": COCO style BLIP 2 instance caption chosen by CLIP.
"original_image_mask": Original image instance content mask
"inpainted_delta": Additive inpainted instance content
"instance_alpha": Alpha layer of the inpainted instance
}
```
# Dataset decomposition
First you need to make sure you have the `unrar` package for ubuntu. You can install it by using the following command.
```
sudo apt-get install rar unrar
```
Then the command below will extract the dataset.
```
unrar x -e mulan.part001.rar
```
Afterwards create the required conda environment
```
conda env create --name mulan --file=mulan_env.yml
conda activate mulan
```
Then manually create a csv with two column `image` and `annotation` similarly with the toy example below. ***Please pay attention to COCO dataset*** specifically as some base images are from the `train2017` subset some are from the `val2017` one.
```
image, annotation
<path_to_image>/<image_id>.jpg, <path_to_annotation>/<image_id>.p.zl
<path_to_image>/<image_id>.jpg, <path_to_annotation>/<image_id>.p.zl
<path_to_image>/<image_id>.jpg, <path_to_annotation>/<image_id>.p.zl
```
We advise to create to separate csvs, one for the COCO dataset and one for the LAION Aesthetic V2 6.5 in order to guarantee no image id clashes.
The provided script can then be used to reconstruct the RGBA stacks. Please be advised that we are using joblib to paralelise the decomposition so your CPU and I/O might be heavily impacted during the script running.
Be careful of the following:
- `output_path` needs to be without the trailing `/`
- `number_of_processes` if unspecified will default to `2 * number of cores`
```
python3 dataset_decomposition.py \
--csv_path='/path/to/images/and/annotations/file.csv' \
--output_path='/path/to/where/images/will/be/decomposed' \
--number_of_processes=<<number of cores>>
```
In the `/path/to/where/images/will/be/decomposed`, the script will generate multiple images per original RGB image following the structure below as well as a `meta_data.csv` file. The csv will have three columns inside `paths` of the individual layers, `blip2` caption of the layer and `llava` caption of the same layer. The `llava` caption will be `N/A` for instances as we have not generate those.
```
<<image_id>>-layer_0.png - Background RGB Image
<<image_id>>-layer_x.png - Instance X RGBA Image
```
# Examples
## COCO


## LAION Aesthetic v2 6.5


# Possible applications
## Instance Addition through MuLAn finetuned InstructPix2Pix

## Instance Generation through MuLAn finetuned StableDiffusion v1.5

# Reference
Please do not forget to cite our work if you are using this dataset in your research.
Corresponding author is Petru-Daniel Tudosiu (petru.daniel.tudosiu@huawei.com).
```
@article{tudosiu2024mulan,
title={MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation},
author={Petru-Daniel Tudosiu and Yongxin Yang and Shifeng Zhang and Fei Chen and Steven McDonagh and Gerasimos Lampouras and Ignacio Iacobacci and Sarah Parisot},
year={2024},
eprint={2404.02790},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
---
许可证:知识共享署名-非商业性使用-相同方式共享4.0(CC BY-NC-SA 4.0)
任务类别:
- 文本到图像生成
语言:
- 英语
标签:
- 图像分解
- RGBA
- 多层
- COCO(通用物体检测数据集,Common Objects in Context)
- LVIS(大词汇量实例分割数据集)
- LAION
数据集简称:MuLAn
数据规模:
- 1万<n<10万
---
# MuLAn:用于可控文本到图像生成的多层标注数据集
MuLAn是一款全新的数据集,包含超过4.4万份针对RGB图像的多层、逐实例RGBA分解标注,以及超过10万张实例图像。该数据集由MuLAn-COCO与MuLAn-LAION两个子数据集组成,二者涵盖了风格、构图与复杂度各异的图像分解任务。依托MuLAn,我们推出了首个面向高质量图像的、提供实例分解与遮挡信息的真实感资源,为文本到图像生成式AI研究开辟了全新方向。本数据集旨在推动新型生成与编辑技术的发展,尤其是面向分层处理的解决方案。
# 常见问题(FAQ)
1) LVIS数据集与COCO 2017数据集等效,请确保您使用的是正确版本的数据集。此外,该数据集的图像来源于训练集与验证集划分,请同时检查原始数据集的两个子文件夹。
2) LAION数据集基于LAION Aesthetic v2 6.5+版本构建,仅当使用img2dataset库下载时,文件名关联逻辑方可生效,因为parquet文件中的行索引即为文件名。此外,鉴于[LAION相关情况](https://laion.ai/notes/laion-maintanence/),在完成图像审核并确认所有原始图像均未违反GDPR(通用数据保护条例)之前,我们将不会发布原始图像的链接。
# 数据集格式
为尊重基础数据集的版权协议,我们仅以标注格式发布了MuLAn数据集。
每张图像均对应一个遵循以下结构的Pickle文件。我们还发布了一款小型脚本,当输入包含基础图像与标注对应关系的CSV文件时,该脚本可自动重建分解后的图像,并将标注文本与路径元数据保存至单独的CSV文件中。
"标注文本": {
"llava": LLaVA模型相关细节
"blip2": BLIP-2模型相关细节
"clip": CLIP模型相关细节
}
"背景": {
"llava": LLaVA生成的详细背景标注文本
"blip2": 由CLIP筛选的、符合COCO数据集风格的BLIP-2背景标注文本
"original_image_mask": 原始图像背景内容掩码
"inpainted_delta": 叠加式修复后的背景内容
}
"图像": {
"llava": LLaVA生成的原始图像详细标注文本
"blip2": 由CLIP筛选的、符合COCO数据集风格的BLIP-2原始图像标注文本。
}
"实例": {
"blip2": 由CLIP筛选的、符合COCO数据集风格的BLIP-2实例标注文本
"original_image_mask": 原始图像实例内容掩码
"inpainted_delta": 叠加式修复后的实例内容
"instance_alpha": 修复后实例的Alpha通道层
}
# 数据集分解流程
首先,请确保您的Ubuntu系统已安装`unrar`工具,可通过以下命令完成安装:
sudo apt-get install rar unrar
随后执行以下命令即可解压数据集:
unrar x -e mulan.part001.rar
接着创建所需的Conda环境:
conda env create --name mulan --file=mulan_env.yml
conda activate mulan
随后手动创建一个包含`image`与`annotation`两列的CSV文件,格式可参考以下示例。***请特别注意COCO数据集***:部分基础图像来源于`train2017`子集,另一部分则来源于`val2017`子集。
image, annotation
<图像路径>/<图像ID>.jpg, <标注路径>/<图像ID>.p.zl
<图像路径>/<图像ID>.jpg, <标注路径>/<图像ID>.p.zl
<图像路径>/<图像ID>.jpg, <标注路径>/<图像ID>.p.zl
我们建议为COCO数据集与LAION Aesthetic V2 6.5分别创建CSV文件,以避免图像ID冲突。
随后可使用提供的脚本重建RGBA图像栈。请注意,我们使用Joblib实现了解码过程的并行化,因此脚本运行期间可能会对CPU与I/O资源造成较大占用。
请注意以下事项:
- `output_path` 末尾不得包含`/`
- 若未指定`number_of_processes`,则默认值为`2 * CPU核心数`
python3 dataset_decomposition.py
--csv_path='/图像与标注文件的CSV路径'
--output_path='/分解后图像的存储路径'
--number_of_processes=<<CPU核心数>>
在上述`分解后图像的存储路径`中,脚本将为每张原始RGB图像生成多张图像,遵循以下命名结构,同时还会生成一个`meta_data.csv`文件。该CSV文件包含三列:各分层的路径、该分层的BLIP-2标注文本,以及该分层的LLaVA标注文本。对于实例分层,LLaVA标注文本将显示为`N/A`,因为我们尚未生成对应的标注。
<<图像ID>>-layer_0.png - 背景RGB图像
<<图像ID>>-layer_x.png - 第X个实例的RGBA图像
# 示例
## COCO数据集示例


## LAION Aesthetic v2 6.5 数据集示例


# 可能的应用场景
## 基于MuLAn微调的InstructPix2Pix实现实例添加

## 基于MuLAn微调的StableDiffusion v1.5实现实例生成

# 引用说明
若您在研究中使用本数据集,请务必引用我们的相关工作。
通讯作者为Petru-Daniel Tudosiu(邮箱:petru.daniel.tudosiu@huawei.com)。
bibtex
@article{tudosiu2024mulan,
title={MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation},
author={Petru-Daniel Tudosiu and Yongxin Yang and Shifeng Zhang and Fei Chen and Steven McDonagh and Gerasimos Lampouras and Ignacio Iacobacci and Sarah Parisot},
year={2024},
eprint={2404.02790},
archivePrefix={arXiv},
primaryClass={cs.CV}
}