Multimodal feature interaction and semantic guided fusion for RGB-T population counting
收藏中国科学数据2026-01-15 更新2026-04-25 收录
下载链接:
https://www.sciengine.com/AA/doi/10.13700/j.bh.1001-5965.2023.0735
下载链接
链接失效反馈官方服务:
资源简介:
RGB-T mode crowd counting is designed to take advantage of the complementarity of visible RGB and thermal infrared image to achieve crowd counting. Aiming at the problems of insufficient information interaction between modes and insufficient feature fusion in the feature extraction of the RGB-T multimodal crowd counting method, an RGB-T crowd counting method based on multi-modal feature interaction and semantic guided fusion is proposed. Firstly, a stacked small scale convolution kernel is designed as a branch of the backbone network to extract the coarse features of each single mode. Secondly, in order to address the limited information interaction between the modes, a multi-modal feature interaction module is suggested. This module will extract the features of each RGB and thermal infrared mode and actualize the interactive features of the mode information. Then, a semantic-guided fusion module is designed to enhance the semantic relevance of multi-modal crowd features through global and local feature-guided fusion, so as to fully integrate multi-context information and improve the recognition ability of the target population. Finally, the regression head is used to generate the population density map and output the counting results. Experimental results demonstrate that the proposed method outperforms the comparison algorithms on the open RGBT-CC dataset, with a 31.12% reduction in the root-mean-square error value compared to the CMCRL method and higher accuracy for crowd counting under various scenarios.
创建时间:
2026-01-15



