Distance-Annotated Traffic Perception Question Answering (DTPQA)
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/9rj4kyrx9k
下载链接
链接失效反馈官方服务:
资源简介:
Distance-Annotated Traffic Perception Question Answering (DTPQA) is a Visual Question Answering (VQA) benchmark designed specifically for this purpose: it can be used to evaluate the perception systems of VLMs in traffic scenarios using trivial yet crucial questions relevant to driving decisions. It consists of two parts: a synthetic benchmark (DTP-Synthetic) created using a simulator, and a real-world benchmark (DTP-Real) built on top of existing images of real traffic scenes. Additionally, DTPQA includes distance annotations, i.e., how far the object in question is from the camera. More specifically, each DTPQA sample consists of (at least): (a) an image, (b) a question, (c) the ground truth answer, and (d) the distance of the object in question, enabling analysis of how VLM performance degrades with increasing object distance.
带距离标注的交通感知问答(Distance-Annotated Traffic Perception Question Answering, DTPQA)是一款专为交通场景设计的视觉问答(Visual Question Answering, VQA)基准数据集,可通过与驾驶决策相关、通俗易懂却至关重要的问答任务,评估视觉语言模型(Vision-Language Models, VLMs)在交通场景下的感知系统。该数据集包含两个子集:一是基于模拟器生成的合成基准数据集(DTP-Synthetic),二是基于真实交通场景现有图像构建的真实世界基准数据集(DTP-Real)。此外,DTPQA还包含距离标注,即目标物体与拍摄相机的实际距离。更具体而言,每条DTPQA样本至少包含以下四项内容:(a) 图像、(b) 问题、(c) 标准答案、(d) 对应目标物体的距离,这使得研究者能够分析视觉语言模型的性能随目标物体距离增加而下降的规律。
创建时间:
2026-02-24



