RoboSense Track 1 Drive with Language Dataset
收藏RoboSense Track 1: Driving with Language 数据集概述
数据集简介
- 名称: RoboSense Track 1 Drive with Language Dataset
- 基础基准: 基于DriveLM和DriveBench基准
- 目标: 评估视觉语言模型(VLMs)在复杂城市环境中回答高级驾驶问题的能力
数据内容
- 输入数据:
- 多视角摄像头输入(来自nuScenes数据集)
- 自然语言指令(包含感知、预测和规划任务)
- 物体定位(通过场景中物体的中心点表示)
数据集统计
| 驾驶任务 | 问题数量 | 问题类型 |
|---|---|---|
| 感知 | 361 | 多选题(MCQs)、视觉问答(VQA) |
| 预测 | 522 | 多选题(MCQs) |
| 规划 | 513 | 视觉问答(VQA) |
VQA问题子类型
- VQA<sub>obj</sub>: 关于场景中物体的问题
- VQA<sub>scene</sub>: 关于整体场景的问题
基准性能
使用Qwen2.5-VL-7B-Instruct作为基准模型:
| 任务 | 问题类型 | 准确率(%) |
|---|---|---|
| 感知 | MCQ | 75.5 |
| VQA<sub>obj</sub> | 29.2 | |
| VQA<sub>scene</sub> | 22.2 | |
| 预测 | MCQ | 59.2 |
| 规划 | VQA<sub>obj</sub> | 29.6 |
| VQA<sub>scene</sub> | 31.2 | |
| 平均 | 所有类型 | 42.5 |
评估指标
- 准确率(Accuracy): 用于所有多选题(MCQs)
- LLM评分(LLM Score): 用于所有视觉问答(VQA),使用LLM根据详细评分标准对答案进行评分
相关资源
- 数据集地址: https://huggingface.co/datasets/robosense/datasets/tree/main/track1-driving-with-language
- 基准模型: https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct
- 相关论文:
- https://arxiv.org/abs/2501.04003
- DriveLM: Driving with graph visual question answering (ECCV 2024)
引用信息
bibtex @article{xie2025drivebench, title = {Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives}, author = {Xie, Shaoyuan and Kong, Lingdong and Dong, Yuhao and Sima, Chonghao and Zhang, Wenwei and Chen, Qi Alfred and Liu, Ziwei and Pan, Liang}, journal = {arXiv preprint arXiv:2501.04003}, year = {2025} }
bibtex @inproceedings{sima2024drivelm, title = {DriveLM: Driving with graph visual question answering}, author = {Sima, Chonghao and Renz, Katrin and Chitta, Kashyap and Chen, Li and Zhang, Hanxue and Xie, Chengen and Bei{ss}wenger, Jens and Luo, Ping and Geiger, Andreas and Li, Hongyang}, booktitle = {European Conference on Computer Vision}, pages = {256-274}, year = {2024}, organization = {Springer} }




