KEPT: Knowledge‑Enhanced Prediction of Trajectories from Consecutive Driving Frames with Vision-Language Models
收藏ETS-Data2025-12-31 更新2026-02-07 收录
下载链接:
https://doi.org/10.26599/ETSD.2025.9190073
下载链接
链接失效反馈官方服务:
资源简介:
Accurate short-horizon trajectory prediction is crucial for safe and reliable autonomous driving. However, existing vision-language models (VLMs) often fail to accurately understand driving scenes and generate trustworthy trajectories. To address this challenge, this paper introduces KEPT, a knowledge-enhanced VLM framework that predicts ego trajectories directly from consecutive front-view driving frames. KEPT integrates a temporal frequency–spatial fusion (TFSF) video encoder, which is trained via self-supervised learning with hard-negative mining, with a k-means & HNSW retrieval-augmented generation (RAG) pipeline. Retrieved prior knowledge is added into chain-of-thought (CoT) prompts with explicit planning constraints, while a triple-stage fine-tuning paradigm aligns the VLM backbone to enhance spatial perception and trajectory prediction capabilities. This replication package includes all materials required for readers to understand and reproduce the analyses reported in the paper.



