five

BryceJia/MICo-150K

收藏
Hugging Face2026-03-04 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/BryceJia/MICo-150K
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 tags: - Multi-Image Composition - Image Editing - Image Generation size_categories: - 100K<n<1M task_categories: - image-to-image language: - en --- # MICo-150K Dataset [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://www.arxiv.org/pdf/2512.07348) [![ArXiv](https://img.shields.io/badge/arXiv-A42C25?style=for-the-badge&logo=arxiv&logoColor=white&color=blue)](https://www.arxiv.org/abs/2512.07348) [![Github](https://img.shields.io/badge/MICo150K-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)](https://github.com/A113N-W3I/MICo-150K) [![Project Page](https://img.shields.io/badge/Project_Page-00CED1?style=for-the-badge&logo=web&logoColor=white)](https://mico-150k.github.io/) ## 🌟 Catalogue 1. [Dataset Details](#🔍-dataset-details) 2. [Data Structure](#📃-data-structure) 3. [Gallery: Decompose & Recopmose](#gallery-dere-subset) 4. [Gallery: Human Centric Tasks](#gallery-human-centric) 5. [Gallery: Object Centric Tasks](#gallery-object-centric) 6. [Gallery: Human Object Interaction](#gallery-hoi) ## 📖 Introduction **MICo-150K** is a large-scale synthetic dataset generated by **Nano Banana** and **Nano Banana Pro**, designed to advance open-source models in **M**ulti-**I**mage **Co**mposition (MICo). We fine-tune a diverse set of base models—including [Qwen-Image](https://github.com/QwenLM/Qwen-Image), [BAGEL](https://bagel-ai.org/), [OmniGen2](https://github.com/VectorSpaceLab/OmniGen2), [Lumina-DiMOO](https://github.com/Alpha-VLLM/Lumina-DiMOO), and [BLIP3o-Next-Edit](https://github.com/JiuhaiChen/BLIP3o), on MICo-150K. All models demonstrate substantial performance improvements on our proposed MICo Bench after fine-tuning. Notably, Qwen-Image, originally developed as a text-to-image model, achieves remarkable gains after adaptation. The fine-tuned variant, Qwen-Image-MICo, surpasses Qwen-Image-Edit-2509 on both MICo Bench and OmniContext benchmarks, highlighting its strong generalization capability and broad applicability in multi-image composition scenarios. ![dataset-case-compressed](https://cdn-uploads.huggingface.co/production/uploads/655db6a58c2d4379a70837c0/ScCECWMm8hVa2W_csX8_u.jpeg) ## 🔍 Dataset Details We organize the MICo-150K dataset into **three primary categories** and **one specialized task subset, [De&Re](#gallery-dere-subset)**. Each primary category encompasses multiple sub-tasks: * [Human-Centric Tasks](#gallery-human-centric) * [Object-Centric Tasks](#gallery-object-centric) * [Human–Object Interaction (HOI) Tasks](#gallery-hoi) ### Human-Centric Tasks 1. Two Persons (3K samples each for two males, two females, and one male–one female scenarios) 2. Three Persons (3K samples each for three males, three females, one male–two females, and two males–one female scenarios) 3. Person(s) + Scene (6K samples for one person + scene and 6K samples for two persons + scene) ### Object-Centric Tasks 1. Multi-Object Composition (5K samples each for two, three, four, and five objects) 2. Object(s) + Scene (5K samples each for one object + scene and two objects + scene) ### Human–Object Interaction (HOI) Tasks 1. Person + Apparel (6K samples each for one person with one, two, three, and four apparel items) 2. Person + Object (6K samples each for one person + one object, one person + two objects, two persons + one object, and two persons + two objects) 3. Person + Apparel + Object (6K samples each for 1H1C1O, 1H1C2O, 1H2C1O, and 1H2C2O configurations) ### De&Re Subset The De&Re subset focuses on decomposition and recomposition tasks. Specifically, elements from a real-world image (e.g., persons, apparel, scenes) are first decomposed into multiple component images. These components are subsequently recomposed into a single composite image according to a specified instruction. This subset contains 11K samples in total. ## 📃 Data Structure For Human-Centric, Object-Centric, and HOI tasks, the Parquet files share a unified schema with the following keys: * `input_images`: a list of input images * `output_image`: the composed image * `instruction`: a descriptive or imperative instruction specifying how to compose the input images * `separate_prompt`: captions corresponding to each input image * `editing_type`: a label describing the task type For the De&Re subset, the Parquet files contain the following keys: * `reference`: the real image used for decomposition * `input`: a list of images obtained by decomposing the `reference` image * `output`: the recomposed image * `instruction`: a description of how to compose the decomposed input images * `separate_prompt`: captions corresponding to each decomposed input image * `editing_type`: a label describing the task type ## 🖼️ Gallery ### Gallery: De&Re Subset ![image](https://cdn-uploads.huggingface.co/production/uploads/655db6a58c2d4379a70837c0/zpF3JXxGQOgxgw_miWpBm.png) ![image](https://cdn-uploads.huggingface.co/production/uploads/655db6a58c2d4379a70837c0/BgjKyyjce1SBwgDeYlz1Z.png) ![image](https://cdn-uploads.huggingface.co/production/uploads/655db6a58c2d4379a70837c0/o5rXn8aLRU9_lPJy0bO_j.png) ![image](https://cdn-uploads.huggingface.co/production/uploads/655db6a58c2d4379a70837c0/VzO2weUE6Dr5RXP5soPOW.png) ![image](https://cdn-uploads.huggingface.co/production/uploads/655db6a58c2d4379a70837c0/vuXsNw7GAGh617_pG0wVd.png) ![image](https://cdn-uploads.huggingface.co/production/uploads/655db6a58c2d4379a70837c0/Jkr2V9fOQYvFYybjGRFZ1.png) ### Gallery: Human Centric ![image](https://cdn-uploads.huggingface.co/production/uploads/655db6a58c2d4379a70837c0/4lmUO4_UAwDYzKghryAVM.png) ![image](https://cdn-uploads.huggingface.co/production/uploads/655db6a58c2d4379a70837c0/jmeVVoC_0XMOXSokZ7q5O.png) ![image](https://cdn-uploads.huggingface.co/production/uploads/655db6a58c2d4379a70837c0/vCZv5zd_GYLnqxtTpGPyd.png) ![image](https://cdn-uploads.huggingface.co/production/uploads/655db6a58c2d4379a70837c0/pZ85ZP8bUdAXmAlWUJEc-.png) ### Gallery: Object Centric ![image](https://cdn-uploads.huggingface.co/production/uploads/655db6a58c2d4379a70837c0/fWkOJu7GCkqADQbEFOTPf.png) ![image](https://cdn-uploads.huggingface.co/production/uploads/655db6a58c2d4379a70837c0/JUwzoGOkGzHB2HSFH8dQl.png) ![image](https://cdn-uploads.huggingface.co/production/uploads/655db6a58c2d4379a70837c0/Zuq8TlPi02GcKO7AyFy4a.png) ### Gallery: HOI ![image](https://cdn-uploads.huggingface.co/production/uploads/655db6a58c2d4379a70837c0/2dUWgRXVHlMToVL0S5VLX.png) ![image](https://cdn-uploads.huggingface.co/production/uploads/655db6a58c2d4379a70837c0/dfaKo6ToJ4x6wIDMpnf4P.png) ![image](https://cdn-uploads.huggingface.co/production/uploads/655db6a58c2d4379a70837c0/ijqhCNOXwkZDCTn4wfwbn.png) ![image](https://cdn-uploads.huggingface.co/production/uploads/655db6a58c2d4379a70837c0/37cu8--VTo4BnzsvKlxtK.png) ## ✨ Citation If you find this dataset or the associated work useful for your research, please cite the paper: ```bib @article{wei2025mico, title={MICo-150K: A Comprehensive Dataset Advancing Multi-Image Composition}, author={Wei, Xinyu and Cen, Kangrui and Wei, Hongyang and Guo, Zhen and Li, Bairui and Wang, Zeqing and Zhang, Jinrui and Zhang, Lei}, journal={arXiv preprint arXiv:2512.07348}, year={2025} } ```
提供机构:
BryceJia
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作