Self-Supervised Visual Odometry with RGB-D Bimodal Mutual Guidance

中国科学数据2026-04-13 更新2026-04-25 收录

下载链接：

https://www.sciengine.com/AA/doi/10.12466/xhcl.2026.03.009

下载链接

链接失效反馈

官方服务：

资源简介：

Visual odometry， which estimates camera poses from image sequences， plays a vital role in robotic navigation， autonomous driving， and augmented reality. Self-supervised visual odometry has become a research focus for its independence from ground-truth pose data. It optimizes pose and depth estimation by constructing a self-supervised loss based on geometric consistency across views. A key challenge in this framework is how to design network architectures that fully exploit the complementary pose-related cues from both RGB images and depth maps. Existing methods often overlook the heterogeneous characteristics and complementary value of the two modalities， leading to insufficient cue utilization and limited pose estimation accuracy. To address this issue， this paper proposes a self-supervised visual odometry method with RGB-D bimodal mutual guidance， named BMG-VO. Specifically， an RGB-guided depth detail enhancement module is designed to incorporate texture and color priors from RGB images into the shallow layers of the depth encoding branch. This enhances the ability of depth features to capture fine details， such as edges and textures， thereby improving the robustness of feature matching. Meanwhile， a depth-guided RGB semantic enhancement module is introduced to reinforce the high-level features of the RGB encoding branch with geometric structure and intra-class consistency cues derived from depth maps. This increases robustness against illumination variations and provides more reliable matching features for pose regression. Additionally， a unimodal filtering module is employed to highlight the most essential pose-related cues within each individual modality. Extensive experiments on the KITTI dataset demonstrate that BMG-VO achieves higher accuracy in pose estimation compared to state-of-the-art self-supervised methods while also attaining excellent depth estimation performance.

视觉里程计（Visual Odometry）通过图像序列估计相机位姿，在机器人导航、自动驾驶与增强现实领域发挥着至关重要的作用。自监督视觉里程计无需依赖真实位姿标注数据，因此成为当前的研究热点。该方法通过构建基于视图间几何一致性的自监督损失函数，同时优化位姿与深度估计任务。该框架面临的核心挑战在于，如何设计网络架构以充分挖掘RGB图像与深度图这两种模态中与位姿相关的互补线索。现有方法往往忽略了两种模态的异质性特征与互补价值，导致线索利用不充分，位姿估计精度受限。为解决这一问题，本文提出了一种基于RGB-D双模态互引导的自监督视觉里程计方法，命名为BMG-VO。具体而言，本文设计了RGB引导的深度细节增强模块，将RGB图像中的纹理与颜色先验信息融入深度编码分支的浅层网络中，以此增强深度特征捕捉边缘、纹理等精细细节的能力，进而提升特征匹配的鲁棒性。与此同时，本文还提出了深度引导的RGB语义增强模块，利用深度图提取的几何结构与类内一致性线索，强化RGB编码分支的高层特征，提升模型对光照变化的鲁棒性，并为位姿回归任务提供更可靠的匹配特征。此外，本文引入了单模态过滤模块，以突出各模态中与位姿相关的核心线索。在KITTI数据集上开展的大量实验表明，相较于当前最优的自监督视觉里程计方法，BMG-VO在保持优异深度估计性能的同时，位姿估计精度也得到了显著提升。

创建时间：

2026-04-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集