five

MGCA: Lightweight Multimodal Gated Cross-Attention with Balance Regularization for Efficient Medical Image-Text Alignment

收藏
IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/mgca-lightweight-multimodal-gated-cross-attention-balance-regularization-efficient
下载链接
链接失效反馈
官方服务:
资源简介:
Cross-modal image-text matching in resource-constrained environments poses significant challenges due todifficulties in dynamic modality interaction and imbalancedfusion weighting. In this paper, we propose a lightweightmultimodal fusion network, termed Multimodal Gated Cross-Attention (MGCA). The core innovations include: (1) a Multi-Head Gated Cross-Attention (MH-GCA) module, which intro-duces learnable temperature coefficients to adaptively regulatemulti-granular cross-modal interactions; and (2) a Gated Bal-ance Regularization (GBR) strategy that explicitly enforcesmodality weight equilibrium. Experimental results demonstratethat MGCA achieves an F1 score of 91.23% and inferencespeed of 153 samples\/second on Flickr30k, using only 1.5Mparameters.Ablation studies validate the effectiveness of the multi-headgating mechanism, balance regularization, and linear projectionmodule. Notably, MGCA reduces generalization error by 8.2%under low-resource domain adaptation (e.g., using only 10%training data), and outperforms other regularization baselinesincluding KL divergence[11]. This work presents a robustframework, demonstrating significant advantages for resource-sensitive medical applications. MGCA enables real-time, ac-curate image-text alignment on edge devices (e.g., portableultrasound), reducing diagnostic latency while maintainingreliability\u2014critical for emergency medicine.
提供机构:
QingFang Zhang
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作