OmniTry-Bench
收藏魔搭社区2025-12-05 更新2025-09-13 收录
下载链接:
https://modelscope.cn/datasets/Kunbyte/OmniTry-Bench
下载链接
链接失效反馈官方服务:
资源简介:
# OmniTry-Bench: A Comprehensive Benchmark for Virtual Try-on Anything
<p><b>OmniTry-Bench</b> introduces a comprehensive and diverse benchmark for virtual try-on task, enabling thorough evaluation of common wearable objects across 12 types.</p>
For more details, visit the codebase of [OmniTry](https://github.com/OmniTry).
## Constitution of Benchmark
<img src='https://metac-open.oss-cn-hangzhou.aliyuncs.com/kunbyte/open_source/omnitry/benchmark.png' width='99%' />
As the above Figure, we gather evaluation samples within 12 common types of wearable objects, which can be summarized into 4 major classes: clothes, shoes, jewelries and accessories.
We consider detailed sub-types if necessary, such as the class <i>bag</i> consisted of the backpack, shoulder and tote bags.
<i>Clothes</i> are divided into top cloth, bottom cloth, and dress. Each sub-type contains two gender groups (woman and man), with the exceptions that <i>jewelries</i> and <i>dress</i> exclusively contain woman samples, while <i>tie</i> contains only man samples.
Each gender group includes 15 person images, where the garments are categorized into three settings: white background, natural background, and try-on setting. Every garment setting include 5 images.
Following previous work's categorization of virtual try-on scenarios into <i>in-shop</i> and <i>in-the-wild</i>, we further divide the person images for <i>clothes</i> and <p><i>shoes</i> into 15 shop-style and 15 wild-style samples per gender group, resulting in 30 person images per sub-type. Other person images are labeled with <i>wild</i> or <i>shop</i> at the end of their filenames.
The benchmark predominantly sources images from public repositories ([Pexels](https://www.pexels.com)), supplemented with brand website materials and social media content under compliant data usage protocols.
### Benchmark Structure
Make sure the downloaded paired dataset (person-object) are organized according to the following folder structure:
```
OmniTry_Bench/
├── bag/
| ├── backpack/
| | └── man/
| | ├── object/
| | | ├── clean/
| | | | ├── 101_backpack_XX-color.jpg
| | | | └── ...
| | | ├── natural/
| | | | ├── 201_backpack_XX-brown.jpg
| | | | └── ...
| | | └── tryon/
| | | ├── 301_backpack_XX-gray.jpg
| | | └── ...
| | └── person/
| | ├── 001_backpack_XX_shop.jpg
| | └── ...
| ├── shoulder/
| | └── woman/
| | └── ...
| └── tote
| └── ...
|
...
└── tie/
└── man/
└── ...
- omni_vtryon_bench_v1.json
- omni_vtryon_bench_small_v1.json
```
- `bag/tie`: The object types
- `backpack/shoulder`: The object subtypes
- `woman/man`: The gender groups
- `object`: The wearable object images, with the <i>id</i> at the beginning of the filename, and the main color of object at the end of filename.
- `person`: The persons in the shop/wild style, with the <i>id</i> at the beginning of the filename.
There are two try-on index json files. `omni_vtryon_bench_v1.json` contains the full benchmark dataset with 6,975 combinatorial person-object pairs across 12 wearable categories. Its subset `omni_vtryon_bench_small_v1.json` provides 360 curated image pairs sampled under balanced constraints (15 models per type, 7 shop-style/8 wild-style distribution), serving as the core evaluation set for virtual try-on experiments. Both files include metadata annotations for all object types and environmental settings.
The try-on paired items of json files are detailed as the following structure:
```
{
"id": "bag_backpack_man_001_101",
"person": {
"id": "001",
"img_path": "OmniTry_Bench/bag/backpack/man/person/001_backpack_back-of-mens-white-shirt_shop.jpg",
"caption": "the back view of a person standing against a plain white background. The individual is wearing a plain white T-shirt and light green shorts. The person has short, light brown hair that is neatly styled. The posture is relaxed, with the arms hanging naturally by the sides. The lighting in the image is even, highlighting the simplicity of the outfit and the clean background."
},
"object": {
"id": "101",
"img_path": "OmniTry_Bench/bag/backpack/man/object/clean/101_backpack_R-C-color.jpg",
"caption": "Jurassic Park-themed backpack with black, yellow, and red accents."
},
"gt": {
"caption": "A young man standing in a studio with a white background. He is wearing a white t-shirt with a crew neck and short sleeves. His hair is styled neatly, and he is facing away from the camera. He wears light green shorts. The man is now wearing a black backpack with yellow and red accents, featuring the Jurassic Park logo prominently displayed.",
"caption_cate": "the back view of a person standing against a plain white background. The individual is wearing a plain white T-shirt and light green shorts. The person has short, light brown hair that is neatly styled. The posture is relaxed, with the arms hanging naturally by the sides. The lighting in the image is even, highlighting the simplicity of the outfit and the clean background. Wearing a bag on shoulder or in hand. Jurassic Park-themed backpack with black, yellow, and red accents."
},
"garment_class": "bag",
"class_name": "bag_backpack_man"
},
```
- `id`: The try-on paired item's ID, concated with the class name, ID of the person and ID of the object.
- `person`: The person information, include: person ID, person image path, and the caption generated by Qwen2 MLLM ([Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)).
- `object`: The wearable object information, include: object ID, object image path, and the caption generated by Qwen2 MLLM.
- `garment_class`: The category name of object.
- `class_name`: The category name of the try-on pair.
- `gt`: The "caption" is generated descriptive prompts via Qwen2 MLLM.
# OmniTry-Bench:面向任意虚拟试穿的综合基准测试
<p><b>OmniTry-Bench(OmniTry-Bench)</b>是一款面向虚拟试穿(virtual try-on)任务的综合多元基准测试平台,可对12类常见可穿戴物品实现全方位评估。</p>
如需获取更多细节,请访问[OmniTry](https://github.com/OmniTry)的代码仓库。
## 基准测试构成
<img src='https://metac-open.oss-cn-hangzhou.aliyuncs.com/kunbyte/open_source/omnitry/benchmark.png' width='99%' />
如上图所示,我们采集了12类常见可穿戴物品的评估样本,可归纳为四大类别:服装、鞋履、首饰与配饰。
必要时我们会细化子类别,例如<i>包袋(bag)</i>包含双肩背包、单肩包与托特包。<i>服装(clothes)</i>分为上衣、下装与连衣裙。每个子类别均覆盖男女两个性别分组,但存在例外:<i>首饰(jewelries)</i>与<i>连衣裙(dress)</i>仅包含女性样本,而<i>领带(tie)</i>仅收录男性样本。
每个性别分组包含15张人物图像,其中服饰场景分为三类:纯白背景、自然场景与试穿场景,每类场景对应5张图像。参考过往研究将虚拟试穿场景划分为<i>店内场景(in-shop)</i>与<i>户外场景(in-the-wild)</i>的分类范式,我们进一步针对<i>服装(clothes)</i>与<i>鞋履(shoes)</i>的人物图像,按性别分组划分为15张店内风格样本与15张户外风格样本,最终每个子类别共包含30张人物图像。其余人物图像的文件名末尾均标注有<i>wild(户外)</i>或<i>shop(店内)</i>字样以区分场景。
该基准测试的图像主要来源于公开图库([Pexels](https://www.pexels.com)),并在符合数据使用规范的前提下,补充了品牌官网素材与社交媒体内容。
### 基准测试数据结构
请确保下载的人物-物品配对数据集按照以下文件夹结构进行组织:
OmniTry_Bench/
├── bag/
| ├── backpack/
| | └── man/
| | ├── object/
| | | ├── clean/
| | | | ├── 101_backpack_XX-color.jpg
| | | | └── ...
| | | ├── natural/
| | | | ├── 201_backpack_XX-brown.jpg
| | | | └── ...
| | | └── tryon/
| | | ├── 301_backpack_XX-gray.jpg
| | | └── ...
| | └── person/
| | ├── 001_backpack_XX_shop.jpg
| | └── ...
| ├── shoulder/
| | └── woman/
| | └── ...
| └── tote
| └── ...
|
...
└── tie/
└── man/
└── ...
- omni_vtryon_bench_v1.json
- omni_vtryon_bench_small_v1.json
- `bag/tie`:物品类别
- `backpack/shoulder`:物品子类别
- `woman/man`:性别分组
- `object`:可穿戴物品图像,文件名以<i>ID</i>开头,末尾标注物品的主色调。
- `person`:店内/户外风格的人物图像,文件名以<i>ID</i>开头。
本次基准测试包含两份试穿索引JSON文件。`omni_vtryon_bench_v1.json` 为完整基准数据集,涵盖12类可穿戴物品的6975组人物-物品配对样本。其子集`omni_vtryon_bench_small_v1.json`则提供了360组经过筛选的图像配对样本,采样遵循均衡约束条件(每个类别对应15个模特,店内风格与户外风格的比例为7:8),可作为虚拟试穿实验的核心评估集。两份文件均包含所有物品类别与场景的元数据标注。
JSON文件中的试穿配对样本结构详情如下:
{
"id": "bag_backpack_man_001_101",
"person": {
"id": "001",
"img_path": "OmniTry_Bench/bag/backpack/man/person/001_backpack_back-of-mens-white-shirt_shop.jpg",
"caption": "the back view of a person standing against a plain white background. The individual is wearing a plain white T-shirt and light green shorts. The person has short, light brown hair that is neatly styled. The posture is relaxed, with the arms hanging naturally by the sides. The lighting in the image is even, highlighting the simplicity of the outfit and the clean background."
},
"object": {
"id": "101",
"img_path": "OmniTry_Bench/bag/backpack/man/object/clean/101_backpack_R-C-color.jpg",
"caption": "Jurassic Park-themed backpack with black, yellow, and red accents."
},
"gt": {
"caption": "A young man standing in a studio with a white background. He is wearing a white t-shirt with a crew neck and short sleeves. His hair is styled neatly, and he is facing away from the camera. He wears light green shorts. The man is now wearing a black backpack with yellow and red accents, featuring the Jurassic Park logo prominently displayed.",
"caption_cate": "the back view of a person standing against a plain white background. The individual is wearing a plain white T-shirt and light green shorts. The person has short, light brown hair that is neatly styled. The posture is relaxed, with the arms hanging naturally by the sides. The lighting in the image is even, highlighting the simplicity of the outfit and the clean background. Wearing a bag on shoulder or in hand. Jurassic Park-themed backpack with black, yellow, and red accents."
},
"garment_class": "bag",
"class_name": "bag_backpack_man"
},
- `id`:试穿配对样本的唯一标识符,由类别名称、人物ID与物品ID拼接而成。
- `person`:人物信息,包含人物ID、人物图像路径与由Qwen2多模态大模型(Multimodal Large Language Model, MLLM,[Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct))生成的图像描述文本。
- `object`:可穿戴物品信息,包含物品ID、物品图像路径与由Qwen2多模态大模型生成的图像描述文本。
- `garment_class`:物品的类别名称。
- `class_name`:试穿配对样本的类别名称。
- `gt`:其中`caption`为由Qwen2多模态大模型生成的描述性提示词。
提供机构:
maas
创建时间:
2025-09-03



