Multimodal Intent Recognition Based on Attention Modality Fusion
收藏中国科学数据2026-03-16 更新2026-04-25 收录
下载链接:
https://www.sciengine.com/AA/doi/10.19678/j.issn.1000-3428.0069955
下载链接
链接失效反馈官方服务:
资源简介:
Intent recognition is important in natural language understanding. Previous research on intent recognition has primarily focused on single-modal intent recognition for specific tasks. However, in real-world scenarios, human intentions are complex and must be inferred by integrating information such as language, tone, expressions, and actions. Therefore, a novel attention-based multimodal fusion method is proposed to address intent recognition in real-world multimodal scenarios. To capture and integrate the long-range dependencies between different modalities, adaptively adjust the importance of information from each modality, and provide richer representations, a separate self-attention mechanism is used for each modality feature. By adding explicit modality identifiers to the data of each modality, the model can distinguish and effectively fuse information from different modalities, thereby enhancing overall understanding and decision-making capabilities. Given the importance of textual information in cross-modal interactions, a multimodal fusion method based on a cross-attention mechanism is employed, with text as the primary modality and other modalities assisting and guiding the interactions. This approach aims to facilitate interactions among textual, visual, and auditory modalities. Finally, experiments were conducted on the MIntRec and MIntRec2.0 benchmark datasets for multimodal intent recognition. The results show that the model outperforms existing multimodal learning methods in terms of accuracy, precision, recall, and F1 score, with an improvement of 0.1 to 0.5 percentage points over the current best baseline model.
创建时间:
2026-03-16



