five

You can only watch the past: track attention network for online spatio-temporal action detection

收藏
中国科学数据2026-01-12 更新2026-04-25 收录
下载链接:
https://www.sciengine.com/AA/doi/10.1007/s11432-024-4501-3
下载链接
链接失效反馈
官方服务:
资源简介:
Online spatio-temporal action detection (OSTAD) aims to identify and localize action instances in real-time video streams without accessing future frames. However, the online setting imposes strict constraints of incremental inference, limited memory, and causal processing, which severely restrict the availability of effective information. To address this, we propose the track attention network (TAN), introducing a history-aware track-and-detect paradigm. Instead of detecting actions independently at each frame, TAN leverages historical detection results and spatio-temporal continuity to enhance current-frame features. Specifically, we propose three strategies. First, a history-aware actor distribution prediction strategy estimates current actor distributions based on spatial continuity and appearance similarity. Second, an actor distribution inference strategy via track attention introduces two attention modules—track channel attention and track efficient attention—to model semantic relations among actor distributions for robust fusion. Third, a history-aware feature modulation strategy injects localization priors from actor distributions into action features, improving representation quality and detection accuracy. Extensive experiments on the JHMDB21 and UCF24 benchmarks demonstrate the effectiveness of our method. TAN achieves 80.3% frame-level mAP (f-mAP) and 88.3% video-level mAP (v-mAP) on JHMDB21, and 88.1% f-mAP and 54.8% v-mAP on UCF24, outperforming existing online methods and even several offline approaches.
创建时间:
2025-06-20
二维码
社区交流群
二维码
科研交流群
商业服务