MGCA: Lightweight Multimodal Gated Cross-Attention with Balance Regularization for Efficient Medical Image-Text Alignment

Name: MGCA: Lightweight Multimodal Gated Cross-Attention with Balance Regularization for Efficient Medical Image-Text Alignment
Creator: QingFang Zhang
License: 暂无描述

IEEE2026-04-17 收录

下载链接：

https://ieee-dataport.org/documents/mgca-lightweight-multimodal-gated-cross-attention-balance-regularization-efficient

下载链接

链接失效反馈

官方服务：

资源简介：

Cross-modal image-text matching in resource-constrained environments poses significant challenges due todifficulties in dynamic modality interaction and imbalancedfusion weighting. In this paper, we propose a lightweightmultimodal fusion network, termed Multimodal Gated Cross-Attention (MGCA). The core innovations include: (1) a Multi-Head Gated Cross-Attention (MH-GCA) module, which intro-duces learnable temperature coefficients to adaptively regulatemulti-granular cross-modal interactions; and (2) a Gated Bal-ance Regularization (GBR) strategy that explicitly enforcesmodality weight equilibrium. Experimental results demonstratethat MGCA achieves an F1 score of 91.23% and inferencespeed of 153 samples\/second on Flickr30k, using only 1.5Mparameters.Ablation studies validate the effectiveness of the multi-headgating mechanism, balance regularization, and linear projectionmodule. Notably, MGCA reduces generalization error by 8.2%under low-resource domain adaptation (e.g., using only 10%training data), and outperforms other regularization baselinesincluding KL divergence[11]. This work presents a robustframework, demonstrating significant advantages for resource-sensitive medical applications. MGCA enables real-time, ac-curate image-text alignment on edge devices (e.g., portableultrasound), reducing diagnostic latency while maintainingreliability\u2014critical for emergency medicine.

提供机构：

QingFang Zhang

5,000+

优质数据集

54 个

任务类型

进入经典数据集