KEPT: Knowledge‑Enhanced Prediction of Trajectories from Consecutive Driving Frames with Vision-Language Models

ETS-Data2025-12-31 更新2026-02-07 收录

下载链接：

https://doi.org/10.26599/ETSD.2025.9190073

下载链接

链接失效反馈

官方服务：

资源简介：

Accurate short-horizon trajectory prediction is crucial for safe and reliable autonomous driving. However, existing vision-language models (VLMs) often fail to accurately understand driving scenes and generate trustworthy trajectories. To address this challenge, this paper introduces KEPT, a knowledge-enhanced VLM framework that predicts ego trajectories directly from consecutive front-view driving frames. KEPT integrates a temporal frequency–spatial fusion (TFSF) video encoder, which is trained via self-supervised learning with hard-negative mining, with a k-means & HNSW retrieval-augmented generation (RAG) pipeline. Retrieved prior knowledge is added into chain-of-thought (CoT) prompts with explicit planning constraints, while a triple-stage fine-tuning paradigm aligns the VLM backbone to enhance spatial perception and trajectory prediction capabilities. This replication package includes all materials required for readers to understand and reproduce the analyses reported in the paper.

5,000+

优质数据集

54 个

任务类型

进入经典数据集