Multi-modal multi-task internet sentiment recognition model based on mid-to-back fusion strategy
收藏Figshare2025-12-15 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/_b_Multi-modal_multi-task_b_b_internet_sentiment_recognition_model_based_on_mid-to-back_fusion_strategy_b_/30885701
下载链接
链接失效反馈官方服务:
资源简介:
Previous studies in multimodal emotion recognition have not adequately addressed the disparity between unimodal expression and multimodal perception in emotion labeling, nor have they optimized multimodal fusion strategies. This paper introduces an innovative multimodal multi-task emotion recognition model based on a mid-to-late fusion strategy. The model employs BERT, Wav2Vec2.0, and Clip for feature preprocessing of text, audio, and visual modalities, respectively. Features are further refined through convolutional layers and self-attention mechanisms. Independent unimodal emotion labeling trains unimodal models to optimize unimodal emotional representations. Multimodal representations are integrated using a multi-head attention mechanism, and disparities between unimodal and multimodal presentations are captured through a multi-task learning strategy. Experiments on the CH-SIMS Chinese dataset demonstrate that the proposed model significantly outperforms baseline models, with a 3% increase in accuracy and a 1.74% increase in F1 score.Despite significant advancements in emotion recognition, the model still has room for improvement in handling dataset diversity, complexity of real-world applications, non-standard language expressions, and cross-cultural emotion understanding. The proposed model not only effectively enhances the effect of multimodal fusion but also offers a new perspective for the development of multimodal emotion recognition technology.
创建时间:
2025-12-15



