A Method for Infrared and Visible Image Fusion Based on CNN-Mamba Feature Extraction Combined with Information Selection

中国科学数据2026-03-19 更新2026-04-25 收录

下载链接：

https://www.sciengine.com/AA/doi/10.3788/gzxb20265501.0110001

下载链接

链接失效反馈

官方服务：

资源简介：

Infrared sensors capture the thermal radiation emitted from the surface of objects to create images. However， infrared images are prone to noise interference and are limited in capturing fine details and texture information. In contrast， visible sensors capture reflected light from the surface of objects， providing rich details and texture information that align with human visual perception. However， the quality of visible images can be unstable due to factors such as lighting conditions. Therefore， effectively fusing infrared and visible images to generate high-quality images is an important research direction.Currently， image fusion methods for infrared and visible images can be broadly categorised into traditional fusion strategies and deep learning-based fusion strategies. Traditional methods struggle to effectively integrate the feature information from the two modalities， especially in complex fusion scenarios. Deep learning-based fusion methods， such as ResNet and DenseNet， can better handle multi-modal information， but they involve complex training processes and high computational overhead. To address these challenges， this paper proposes an infrared and visible image fusion method that combines CNN-Mamba feature extraction with an information selection mechanism. The aim is to overcome the shortcomings of traditional methods in complex scenarios while simplifying the training process of deep learning-based fusion strategies. In the image encoding phase， we introduce the Spatial Context Aware Module （SCAM） for the first time in the field of image fusion， which integrates contextual feature information and allocates attention weights adaptively. Then， we design the CNN-Mamba Feature Extract （CMFE） module， which integrates CNN and Mamba networks to perform deep feature extraction on both modalities of the image.In the fusion phase， considering the information differences between infrared and visible images in the same scene， we propose a novel adaptive weight selection fusion layer to retain more source information. Specifically， we calculate the normalised mutual information between the two modalities and set a threshold. When the normalised mutual information exceeds the threshold， the fusion weights are set to be equal to each other. When the mutual information is below the threshold， we assign higher weights to the modality with higher information entropy to preserve more source information. Finally， in the decoding phase， we utilise a decoder consisting of four CNN modules to progressively reduce the number of channels and reconstruct the fused image.During the experimental phase， ablation studies validate the rationality of the proposed network structure and loss function design. When compared with seven advanced fusion methods， our method demonstrates excellent performance on the MSRS， Road-Scene， TNO， and M3FD public datasets， ranking first in at least three objective metrics. Moreover， the EN and CC values achieve the highest in all tests. This indicates that the proposed method effectively facilitates cross-modal information interaction and preserves the original information as much as possible. Additionally， pedestrian and vehicle detection experiments were conducted on the Road-Scene dataset， and the results show that our method achieved the optimal performance in both pedestrian and vehicle detection， further validating its practical application advantages.In conclusion， the proposed method demonstrates significant advantages in both image fusion quality and subsequent applications.

创建时间：

2026-02-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集