Ablation results on Nerf-Det [28].
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Ablation_results_on_Nerf-Det_28_/29100952
下载链接
链接失效反馈官方服务:
资源简介:
3D object detection based solely on image data presents a significant challenge in computer vision, primarily due to the need to integrate geometric perception processes derived from visual inputs. The key to overcoming this challenge lies in effectively capturing the geometric relationships across multiple viewpoints, thereby establishing strong geometric priors. Current methods commonly back-project voxels onto images to align voxel-pixel features, yet during this process, pixel features are insufficiently involved in learning, leading to a decrease in geometric perception accuracy and, consequently, impacting detection performance. To address this limitation, we propose a novel network framework called ImVoxelGNet. This framework first integrates features projected onto pixels via a expansion operation, compensating for the pixel information inadequately utilized in traditional back-projection methods, thus enabling more precise learning of spatial geometric features. Additionally, we design an implicit geometric perception structure that further refines the spatial geometric features obtained after integrating image features, learning the occupancy relationships in spatial voxels and updating them within the spatial features. Finally, we generate the final prediction results by combining a detection head with 3D convolutions. Evaluation on the ScanNetV2 multi-view 3D object detection dataset demonstrates that ImVoxelGNet achieves a performance improvement of up to 2.2% in mean average precision (mAP). This improvement effectively demonstrates the efficacy of our method in significantly enhancing 3D object detection performance through improved geometric perception and comprehensive scene understanding. Codes and data are released in https://github.com/xug-coder/ImVoxelGNet.
仅依靠图像数据开展三维目标检测(3D object detection)是计算机视觉领域的一项重大挑战,其核心原因在于需要整合由视觉输入衍生出的几何感知过程。攻克这一挑战的关键在于有效捕捉多视角下的几何关联,从而构建可靠的几何先验(geometric priors)。现有方法通常将体素(voxels)反投影至图像以对齐体素-像素特征,但在此过程中,像素特征未能得到充分的学习参与,导致几何感知精度下降,进而影响检测性能。为解决这一局限,本文提出一种名为ImVoxelGNet的新型网络框架。该框架首先通过扩张操作整合投影至像素的特征,弥补了传统反投影方法中像素信息未被充分利用的缺陷,从而实现空间几何特征的更精准学习。此外,本文设计了一种隐式几何感知结构,对整合图像特征后得到的空间几何特征进行进一步优化,学习空间体素中的占据关系,并在空间特征中完成更新。最终,结合搭载三维卷积(3D convolutions)的检测头生成最终预测结果。在ScanNetV2多视角三维目标检测数据集上的评估结果表明,ImVoxelGNet在平均精度均值(mean average precision, mAP)上最高可提升2.2%。该提升充分验证了本文方法通过优化几何感知与全面场景理解,可有效增强三维目标检测性能的有效性。代码与数据集已开源至https://github.com/xug-coder/ImVoxelGNet。
创建时间:
2025-05-19



