Virtual Domain-Guided Cross-Modal Distillation with Multi-View Correlation Awareness for Domain-Specific Multi-Modal Neural Mach
收藏IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/virtual-domain-guided-cross-modal-distillation-multi-view-correlation-awareness-domain
下载链接
链接失效反馈官方服务:
资源简介:
We conducted experiments on five benchmark MNMT datasets, including two domain-specific datasets: Fashion-MMT and EMMT; one multi-domain dataset: Wikipedia Image Text (WIT); and two general domain datasets: Multi-30k and Flickr30kEnt-JP. Specifically, 1) Fashion-MMT is a large-scale English-Chinese (En-Zh) product description dataset, which comprises 114,257 automatically generated translations (Fashion-MMT(Large)) and 40,000 manually translated descriptions (Fashion-MMT(Small)). Each triplet data instance contains an English description, a Chinese translation, and their corresponding several images. 2) EMMT is a real-world En-Zh e-commercial multi-modal dataset, including 22,500 triplet data pairs annotated by professional translators. 3) The WIT dataset is a large-scale, multimodal dataset consisting of image-text pairs collected from Wikipedia articles across multiple languages and domains. 4) Multi-30k is widely used general-domain MNMT dataset, which contains 29,000, 1,014 and 1,000 text-image pairs in the train, valid, and test sets. 5) Flickr30kEnt-JP is a Japanese-translated version of the Flickr30k Entities dataset, and it contains 31,783 images, with 5 Japanese captions per image, resulting in a total of 158,915 image-text pairs.
提供机构:
Zhenyu Hou



