BryceJia/MICo-150K
收藏Hugging Face2026-03-04 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/BryceJia/MICo-150K
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
tags:
- Multi-Image Composition
- Image Editing
- Image Generation
size_categories:
- 100K<n<1M
task_categories:
- image-to-image
language:
- en
---
# MICo-150K Dataset
[](https://www.arxiv.org/pdf/2512.07348) [](https://www.arxiv.org/abs/2512.07348) [](https://github.com/A113N-W3I/MICo-150K) [](https://mico-150k.github.io/)
## 🌟 Catalogue
1. [Dataset Details](#🔍-dataset-details)
2. [Data Structure](#📃-data-structure)
3. [Gallery: Decompose & Recopmose](#gallery-dere-subset)
4. [Gallery: Human Centric Tasks](#gallery-human-centric)
5. [Gallery: Object Centric Tasks](#gallery-object-centric)
6. [Gallery: Human Object Interaction](#gallery-hoi)
## 📖 Introduction
**MICo-150K** is a large-scale synthetic dataset generated by **Nano Banana** and **Nano Banana Pro**, designed to advance open-source models in **M**ulti-**I**mage **Co**mposition (MICo).
We fine-tune a diverse set of base models—including [Qwen-Image](https://github.com/QwenLM/Qwen-Image), [BAGEL](https://bagel-ai.org/), [OmniGen2](https://github.com/VectorSpaceLab/OmniGen2), [Lumina-DiMOO](https://github.com/Alpha-VLLM/Lumina-DiMOO), and [BLIP3o-Next-Edit](https://github.com/JiuhaiChen/BLIP3o), on MICo-150K. All models demonstrate substantial performance improvements on our proposed MICo Bench after fine-tuning. Notably, Qwen-Image, originally developed as a text-to-image model, achieves remarkable gains after adaptation. The fine-tuned variant, Qwen-Image-MICo, surpasses Qwen-Image-Edit-2509 on both MICo Bench and OmniContext benchmarks, highlighting its strong generalization capability and broad applicability in multi-image composition scenarios.

## 🔍 Dataset Details
We organize the MICo-150K dataset into **three primary categories** and **one specialized task subset, [De&Re](#gallery-dere-subset)**. Each primary category encompasses multiple sub-tasks:
* [Human-Centric Tasks](#gallery-human-centric)
* [Object-Centric Tasks](#gallery-object-centric)
* [Human–Object Interaction (HOI) Tasks](#gallery-hoi)
### Human-Centric Tasks
1. Two Persons (3K samples each for two males, two females, and one male–one female scenarios)
2. Three Persons (3K samples each for three males, three females, one male–two females, and two males–one female scenarios)
3. Person(s) + Scene (6K samples for one person + scene and 6K samples for two persons + scene)
### Object-Centric Tasks
1. Multi-Object Composition (5K samples each for two, three, four, and five objects)
2. Object(s) + Scene (5K samples each for one object + scene and two objects + scene)
### Human–Object Interaction (HOI) Tasks
1. Person + Apparel (6K samples each for one person with one, two, three, and four apparel items)
2. Person + Object (6K samples each for one person + one object, one person + two objects, two persons + one object, and two persons + two objects)
3. Person + Apparel + Object (6K samples each for 1H1C1O, 1H1C2O, 1H2C1O, and 1H2C2O configurations)
### De&Re Subset
The De&Re subset focuses on decomposition and recomposition tasks. Specifically, elements from a real-world image (e.g., persons, apparel, scenes) are first decomposed into multiple component images. These components are subsequently recomposed into a single composite image according to a specified instruction. This subset contains 11K samples in total.
## 📃 Data Structure
For Human-Centric, Object-Centric, and HOI tasks, the Parquet files share a unified schema with the following keys:
* `input_images`: a list of input images
* `output_image`: the composed image
* `instruction`: a descriptive or imperative instruction specifying how to compose the input images
* `separate_prompt`: captions corresponding to each input image
* `editing_type`: a label describing the task type
For the De&Re subset, the Parquet files contain the following keys:
* `reference`: the real image used for decomposition
* `input`: a list of images obtained by decomposing the `reference` image
* `output`: the recomposed image
* `instruction`: a description of how to compose the decomposed input images
* `separate_prompt`: captions corresponding to each decomposed input image
* `editing_type`: a label describing the task type
## 🖼️ Gallery
### Gallery: De&Re Subset






### Gallery: Human Centric




### Gallery: Object Centric



### Gallery: HOI




## ✨ Citation
If you find this dataset or the associated work useful for your research, please cite the paper:
```bib
@article{wei2025mico,
title={MICo-150K: A Comprehensive Dataset Advancing Multi-Image Composition},
author={Wei, Xinyu and Cen, Kangrui and Wei, Hongyang and Guo, Zhen and Li, Bairui and Wang, Zeqing and Zhang, Jinrui and Zhang, Lei},
journal={arXiv preprint arXiv:2512.07348},
year={2025}
}
```
提供机构:
BryceJia



