Clothing Retrieval Based on Multiple Prompts and Contrastive Image-Text Learning
收藏中国科学数据2026-02-09 更新2026-04-25 收录
下载链接:
https://www.sciengine.com/AA/doi/10.19678/j.issn.1000-3428.0069773
下载链接
链接失效反馈官方服务:
资源简介:
With the continuous development of multimodal learning, the field of image retrieval is facing new opportunities and challenges. Most existing clothing retrieval models are based on convolutional neural networks or a Transformer's unimodal retrieval, ignoring the rich textual information corresponding to images. Moreover, the features that the model can learn tend to be relatively single. This study proposes a clothing retrieval method based on multiple prompts and contrastive image-text learning. This study introduces image and text multiprompt learning to guide a multimodal large model, called FashionCLIP, in learning the multidimensional, high semantic, and multimodal features of clothing. To improve the retrieval ability of the model and fully mine its multimodal potential, the model is optimized in two stages. In the first stage, the image and text encoders are frozen and the text prompt is optimized using image and text cross-entropy loss functions. In the second stage, the text prompt and text encoder are frozen, and the image prompt and image encoder are optimized using triple loss, classification loss, and image and text cross-entropy loss functions. Both intra- and cross-domain retrieval experiments were conducted on the Taobao Live multimodal video product retrieval dataset, known as WAB. The experimental results show that the mean Average Precision (mAP) of this method for intra-domain retrieval is improved by at least 6.1 percentage points compared to traditional models, and Rank-1 is improved by at least 3.5 percentage points compared to traditional models. This method improves the mAP compared to traditional models by at least 8.4 percentage points and Rank-1 by at least 6.4 percentage points in cross-domain retrieval. Additionally, the retrieval results are significantly improved, demonstrating the potential for contrastive learning in the field of clothing retrieval.
创建时间:
2026-02-09



