Windowed Attention Driven Multi-View Geometric Perception and Tracking
收藏DataCite Commons2026-03-19 更新2026-05-05 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=17d23188dcaa4087a83059545916fef5
下载链接
链接失效反馈官方服务:
资源简介:
Multi-Camera Multi-Object Tracking (MC-MOT) plays a pivotal role in applications such as intelligent video surveillance and autonomous driving. However, existing methods often suffer from feature discontinuity caused by severe cross-view occlusions, frequently necessitating reliance on computationally expensive backend graph optimization algorithms to compensate for frontend detection failures. To address these challenges, this paper proposes a novel unified perception framework driven by robust spatio-temporal representation. By deeply fusing spatio-temporal information during the feature extraction stage, the proposed method generates high-fidelity object features to fundamentally enhance tracking performance. First, a geometric projection module utilizing camera intrinsics and extrinsics is employed to lift and fuse multi-view 2D image features into a unified 3D Bird's-Eye-View (BEV) space. Second, a Windowed Spatio-Temporal Fusion (WSTF) module is designed to enhance feature robustness. This module utilizes a cross-attention mechanism where the BEV features of the current frame serve as Queries, while historical features within a local temporal window act as Keys and Values, effectively realizing feature denoising and temporal smoothing. Finally, leveraging these robust spatio-temporal representations, high-precision cross-frame data association is achieved using a standard Kalman Filter combined with the Hungarian algorithm, eliminating the need for complex backend optimization strategies. Experimental results on the challenging Wildtrack dataset demonstrate that the proposed method achieves highly competitive performance, recording an IDF1 of 92.23% (a 2.04% improvement over the baseline) and a MOTA of 89.70% (a 1.8% improvement). Furthermore, the method maintains state-of-the-art tracking results on the MultiviewX dataset. Comprehensive ablation studies further validate the significant effectiveness of the introduced spatio-temporal fusion module The proposed unified framework effectively resolves critical issues regarding feature alignment and long-term occlusion in MC-MOT. This study demonstrates that the construction of end-to-end spatio-temporal representations yields substantial performance gains. Crucially, it proves that high-quality frontend features render basic association algorithms sufficient for handling complex tracking challenges, providing an efficient and concise new benchmark for the field.
提供机构:
Science Data Bank
创建时间:
2026-03-19



