260063Enhance publishing | A Multimodal Sentiment Analysis Model with Multi-source Knowledge guided Visual Confidence Perception
收藏DataCite Commons2026-04-16 更新2026-05-05 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=9a9ddc4a27e8416c9941d1d663c7bf47
下载链接
链接失效反馈官方服务:
资源简介:
Solemnly declare: If you use this open source content in papers, books, academic reports and other works, please quote the following documents (the original link has the latest citation format): PENG Juhong, ZHANG Zhi, LIU Peng, GE Wenhui, LIU Chen, LIAO Lingxin, ZHANG Kai. A Multimodal Sentiment Analysis Model with Multi-source Knowledge guided Visual Confidence Perception[J]. Journal of Electronics & Information Technology, in press. doi: 10.11999/JEIT260063 Authors: PENG Juhong, ZHANG Zhi, LIU Peng, GE Wenhui, LIU Chen, LIAO Lingxin, ZHANG KaiAuthor:①School of Artificial Intelligence, HuBei University.②Key Laboratory of Intelligent Perception Systems and Security, Ministry of Education.③Wuchang Shipbuilding Industry Group Co., LtdDOI:10.11999/JEIT260063Original:https://jeit.ac.cn/cn/article/doi/10.11999/JEIT260063Correspondents: ZHANG Kai,29859491@qq.comOpen source date: April 13(th), 2026Funds: National Natural Science Foundation of China (62377009)Abstract:Objective Multimodal sentiment analysis is hindered by visual environmental noise, sentiment inconsistencies between images and texts, and imbalanced modal contributions. Treating all modalities indiscriminately allows visual noise to severely degrade model performance. Therefore, a robust mechanism to evaluate visual confidence and filter redundant noise is highly necessary. Methods A Multimodal Sentiment Analysis Model with Multi-source Knowledge guided Visual Confidence Perception (MKVP) is proposed (Fig. 1). A Multi-source Knowledge Guidance Matrix is constructed using syntactic, sentiment, and aspect-focused operators (Fig. 2). Driven by this matrix, a Visual Confidence Perception (VCP) module is designed to measure semantic affinity and dynamically suppress irrelevant visual noise (Fig. 3). Finally, a dual stream parallel interaction module facilitates deep cross-modal alignment, and a global gated mechanism dynamically adjusts modality fusion weights.Results and Discussions Extensive experiments are conducted on the MVSA-Single, MVSA-Multiple, and HFM datasets, where the proposed MKVP model consistently outperforms mainstream baseline models in terms of accuracy and F1 score (Table 3). Ablation studies clearly demonstrate the indispensable role of each component, particularly the VCP module, in filtering visual noise and improving feature quality (Table 5). Furthermore, feature space visualizations confirm that the VCP module effectively purifies semantic representations by driving data points with the same sentiment polarity to cluster distinctly (Fig. 4). Case analyses on mismatched samples further verify the model's superiority in resolving cross-modal semantic conflicts (Table 6). The evaluation of model complexity also indicates a high computational efficiency with low inference latency (Table 8).Conclusions The proposed MKVP framework effectively mitigates the issues of visual noise and sentiment inconsistency in multimodal sentiment analysis. By leveraging multi-source knowledge to guide visual confidence perception, and integrating dual-stream interaction with dynamic gated fusion, the model achieves highly robust sentiment representations. This approach provides an efficient and reliable solution for analyzing complex, noisy multimodal data in real-world social media scenarios.
提供机构:
Science Data Bank
创建时间:
2026-04-16



