KevinNotSmile/nuscenes-qa-mini

Name: KevinNotSmile/nuscenes-qa-mini
Creator: KevinNotSmile
Published: 2024-01-19 03:02:03
License: 暂无描述

Hugging Face2024-01-19 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/KevinNotSmile/nuscenes-qa-mini

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-sa-4.0 task_categories: - visual-question-answering - text-generation language: - en size_categories: - 1K<n<10K configs: - config_name: day data_files: - split: train path: "day-train/*" - split: validation path: "day-validation/*" - config_name: night data_files: - split: train path: "night-train/*" - split: validation path: "night-validation/*" --- # NuScenes-QA-mini Dataset ## TL;DR: This dataset is used for multimodal question-answering tasks in autonomous driving scenarios. We created this dataset based on [nuScenes-QA dataset](https://github.com/qiantianwen/NuScenes-QA) for evaluation in our paper [Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI](https://arxiv.org/abs/2312.07886). The samples are divided into day and night scenes. |scene|# train samples|# validation samples| |---|---|---| |day|2,229|2,229| |night|659|659| |Each sample contains| |---| |original token id in nuscenes database| |RGB images from 6 views (front, front left, front right, back, back left, back right)| |5D LiDAR point cloud (distance, intensity, X, Y, and Z axes)| |question-answer pairs| ## Detailed Description This dataset is built on the [nuScenes](https://www.nuscenes.org/) mini-split, where we obtain the QA pairs from the original [nuScenes-QA dataset](https://github.com/qiantianwen/NuScenes-QA). The data in the nuScenes-QA dataset is collected from driving scenes in cities of Boston and Singapore with diverse locations, time, and weather conditions. <img src="nuqa_example.PNG" alt="fig1" width="600"/> Each data sample contains **6-view RGB camera captures, a 5D LiDAR point cloud, and a corresponding text QA pair**. Each LiDAR point cloud includes 5 dimensions of data about distance, intensity, X, Y, and Z axes. In this dataset, the questions are generally difficult, and may require multiple hops of reasoning over the RGB and LiDAR data. For example, to answer the sample question in the above figure, the ML model needs to first identify in which direction the “construction vehicle” appears, and then counts the number of “parked trucks” in that direction. In our evaluations, we further cast the question-answering (QA) as an open-ended text generation task. This is more challenging than the evaluation setup in the original nuScenes-QA [paper](https://arxiv.org/abs/2305.14836), where an answer set is predefined and the QA task is a classification task over this predefined answer set. <img src="image_darken.PNG" alt="fig2" width="600"/> In most RGB images in the nuScenes dataset, as shown in the above figure - Left, the lighting conditions in night scenes are still abundant (e.g., with street lights), and we hence further reduce the brightness of RGB captures in night scenes by 80% and apply Gaussian blur with a radius of 7, as shown in the above figure - Right. By applying such preprocessing to the RGB views in night scenes, we obtain the training and validation splits of night scenes with 659 samples for each split. On the other hand, the RGB views in daytime scenes remain as the origin. The day split contains 2,229 for training and 2,229 for validation respectively. ## How to Use ```py from datasets import load_dataset # load train split in day scene day_train = load_dataset("KevinNotSmile/nuscenes-qa-mini", "day", split="train") ``` ## Citation If you find our dataset useful, please consider citing ``` @inproceedings{caesar2020nuscenes, title={nuscenes: A multimodal dataset for autonomous driving}, author={Caesar, Holger and Bankiti, Varun and Lang, Alex H and Vora, Sourabh and Liong, Venice Erin and Xu, Qiang and Krishnan, Anush and Pan, Yu and Baldan, Giancarlo and Beijbom, Oscar}, booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition}, pages={11621--11631}, year={2020} } @article{qian2023nuscenes, title={NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario}, author={Qian, Tianwen and Chen, Jingjing and Zhuo, Linhai and Jiao, Yang and Jiang, Yu-Gang}, journal={arXiv preprint arXiv:2305.14836}, year={2023} } @article{huang2023modality, title={Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI}, author={Huang, Kai and Yang, Boyuan and Gao, Wei}, journal={arXiv preprint arXiv:2312.07886}, year={2023} } ``` License =================================================================================================== [![CC BY-NC-SA 4.0][cc-by-nc-sa-shield]][cc-by-nc-sa] Being aligned with original nuScenes' license, this work is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa]. [![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa] [cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/ [cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png [cc-by-nc-sa-shield]: https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-lightgrey.svg

license: CC BY-NC-SA 4.0 task_categories: - 视觉问答（visual-question-answering） - 文本生成（text-generation） language: - 英语（en） size_categories: - 1000 < 样本数 < 10000 configs: - config_name: day data_files: - split: train path: "day-train/*" - split: validation path: "day-validation/*" - config_name: night data_files: - split: train path: "night-train/*" - split: validation path: "night-validation/*" # NuScenes-QA-mini 数据集 ## 核心摘要：本数据集用于自动驾驶场景下的多模态问答任务。我们基于[nuScenes-QA数据集（nuScenes-QA dataset）]构建该数据集，用于我们的论文《Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal Large Language Models for Embodied AI》（arXiv:2312.07886）中的评估工作。数据集样本按日间和夜间场景划分。 |场景|训练样本数|验证样本数| |---|---|---| |日间|2229|2229| |夜间|659|659| |每个样本包含| |---| |nuScenes数据库中的原始Token（Token）ID| |6个视角的RGB图像（前视、左前视、右前视、后视、左后视、右后视）| |5D激光雷达（LiDAR）点云（包含距离、强度、X、Y、Z轴共5维数据）| |问答对| ## 详细描述本数据集基于[nuScenes迷你拆分集（nuScenes mini-split）]构建，我们从原始[nuScenes-QA数据集（nuScenes-QA dataset）]中获取问答对。nuScenes-QA数据集的数据采集自美国波士顿与新加坡城市中的驾驶场景，涵盖多样的地点、时间与天气条件。 <img src="nuqa_example.PNG" alt="fig1" width="600"/> 每个数据样本包含**6视角RGB相机采集图像、5D激光雷达点云与对应文本问答对**。每个激光雷达点云包含距离、强度、X、Y、Z轴共5维数据。本数据集的问题通常具备一定难度，需对RGB图像与激光雷达数据进行多步推理。例如，要解答上图中的示例问题，机器学习模型需先识别“工程车辆”出现的方向，随后统计该方向内“停放的卡车”的数量。在本研究的评估中，我们将问答任务进一步建模为开放式文本生成任务，这比原始nuScenes-QA论文（arXiv:2305.14836）中的评估设定更具挑战性——原始设定预定义了答案集合，问答任务为基于该集合的分类任务。 <img src="image_darken.PNG" alt="fig2" width="600"/> 如上图左侧所示，nuScenes数据集中的多数RGB图像中，夜间场景仍具备充足光照（例如借助路灯），因此我们进一步将夜间场景的RGB图像亮度降低80%，并施加半径为7的高斯模糊，如上图右侧所示。通过对夜间场景的RGB视角图像应用此类预处理，我们得到了夜间场景的训练与验证拆分集，每个拆分集均包含659个样本。而日间场景的RGB视角图像保持原始状态，日间拆分集的训练集与验证集分别包含2229个样本。 ## 使用方法 py from datasets import load_dataset # 加载日间场景的训练拆分集 day_train = load_dataset("KevinNotSmile/nuscenes-qa-mini", "day", split="train") ## 引用若您认为本数据集对您的研究有所帮助，请引用以下文献： @inproceedings{caesar2020nuscenes, title={nuscenes: A multimodal dataset for autonomous driving}, author={Caesar, Holger and Bankiti, Varun and Lang, Alex H and Vora, Sourabh and Liong, Venice Erin and Xu, Qiang and Krishnan, Anush and Pan, Yu and Baldan, Giancarlo and Beijbom, Oscar}, booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition}, pages={11621--11631}, year={2020} } @article{qian2023nuscenes, title={NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario}, author={Qian, Tianwen and Chen, Jingjing and Zhuo, Linhai and Jiao, Yang and Jiang, Yu-Gang}, journal={arXiv preprint arXiv:2305.14836}, year={2023} } @article{huang2023modality, title={Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI}, author={Huang, Kai and Yang, Boyuan and Gao, Wei}, journal={arXiv preprint arXiv:2312.07886}, year={2023} } ## 许可协议 [![CC BY-NC-SA 4.0][cc-by-nc-sa-shield]][cc-by-nc-sa] 本数据集遵循与原始nuScenes一致的许可协议，采用[知识共享署名-非商业性使用-相同方式共享4.0国际许可协议（CC BY-NC-SA 4.0）](http://creativecommons.org/licenses/by-nc-sa/4.0/)。 [![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa] [cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/ [cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png [cc-by-nc-sa-shield]: https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-lightgrey.svg

提供机构：

KevinNotSmile

原始信息汇总

NuScenes-QA-mini 数据集

概述

NuScenes-QA-mini 数据集用于自动驾驶场景中的多模态问答任务。该数据集基于 nuScenes-QA 数据集创建，用于评估论文 Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI。数据集分为白天和夜间场景。

场景	训练样本数	验证样本数
白天	2,229	2,229
夜间	659	659

每个样本包含以下内容：

nuScenes 数据库中的原始 token ID
6 个视角的 RGB 图像（前、前左、前右、后、后左、后右）
5D LiDAR 点云（距离、强度、X、Y、Z 轴）
问答对

详细描述

该数据集基于 nuScenes mini-split 构建，从原始的 nuScenes-QA 数据集获取问答对。nuScenes-QA 数据集的数据收集自波士顿和新加坡的城市驾驶场景，具有多样化的地点、时间和天气条件。

每个数据样本包含 6 个视角的 RGB 摄像头捕捉、5D LiDAR 点云和相应的文本问答对。每个 LiDAR 点云包含关于距离、强度、X、Y 和 Z 轴的 5 维数据。在该数据集中，问题通常较难，可能需要对 RGB 和 LiDAR 数据进行多次推理。例如，要回答上图中的示例问题，ML 模型需要首先确定“施工车辆”出现在哪个方向，然后计算该方向上“停放的卡车”的数量。在我们的评估中，我们将问答（QA）视为一个开放式的文本生成任务，这比原始 nuScenes-QA 论文中的评估设置更具挑战性，后者预定义了一个答案集，QA 任务是对此预定义答案集的分类任务。

在 nuScenes 数据集的大多数 RGB 图像中，如上图 - 左所示，夜间场景的照明条件仍然充足（例如，有路灯），因此我们进一步将夜间场景的 RGB 捕捉亮度降低 80%，并应用半径为 7 的高斯模糊，如上图 - 右所示。通过对夜间场景的 RGB 视角应用这种预处理，我们获得了每个分割有 659 个样本的夜间场景的训练和验证分割。另一方面，白天场景的 RGB 视角保持原样。白天分割分别包含 2,229 个训练样本和 2,229 个验证样本。

使用方法

python from datasets import load_dataset

加载白天场景的训练分割

day_train = load_dataset("KevinNotSmile/nuscenes-qa-mini", "day", split="train")

引用

如果您发现我们的数据集有用，请考虑引用：

@inproceedings{caesar2020nuscenes, title={nuscenes: A multimodal dataset for autonomous driving}, author={Caesar, Holger and Bankiti, Varun and Lang, Alex H and Vora, Sourabh and Liong, Venice Erin and Xu, Qiang and Krishnan, Anush and Pan, Yu and Baldan, Giancarlo and Beijbom, Oscar}, booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition}, pages={11621--11631}, year={2020} }

@article{qian2023nuscenes, title={NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario}, author={Qian, Tianwen and Chen, Jingjing and Zhuo, Linhai and Jiao, Yang and Jiang, Yu-Gang}, journal={arXiv preprint arXiv:2305.14836}, year={2023} }

@article{huang2023modality, title={Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI}, author={Huang, Kai and Yang, Boyuan and Gao, Wei}, journal={arXiv preprint arXiv:2312.07886}, year={2023} }

许可证

该数据集遵循原始 nuScenes 的许可证，采用 Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License。

搜集汇总

数据集介绍

构建方式

在自动驾驶多模态问答研究领域，NuScenes-QA-mini数据集构建于nuScenes mini-split基础之上，其问答对源自原始NuScenes-QA数据集。该数据集采集自波士顿与新加坡城市驾驶场景，涵盖多样化的地理位置、时间与天气条件。每个数据样本整合了六视角RGB图像、五维激光雷达点云及对应文本问答对，其中夜间场景的RGB图像经过亮度降低80%与高斯模糊预处理，以增强光照条件差异性，最终形成包含日间2229个训练与验证样本、夜间各659个样本的标准化数据集。

使用方法

使用该数据集时，可通过HuggingFace数据集库直接加载特定场景配置。以日间训练集为例，调用load_dataset函数并指定数据集名称、场景配置与数据分割即可完成加载。数据样本以结构化形式呈现，包含原始数据标识符、多视角图像阵列、激光雷达点云矩阵及问答文本对。研究者可基于该框架构建多模态融合模型，通过联合解析视觉与点云数据生成自然语言答案，适用于自动驾驶环境理解、具身智能等前沿领域的评估与算法开发。

背景与挑战

背景概述

自动驾驶领域对多模态感知与推理能力的需求日益增长，NuScenes-QA-mini数据集应运而生，旨在推动视觉问答任务在复杂驾驶场景中的研究。该数据集由KevinNotSmile团队于2023年基于原始NuScenes-QA数据集构建，核心研究问题聚焦于如何利用多模态数据（包括六视角RGB图像与五维激光雷达点云）进行开放式的视觉问答。其设计初衷在于评估多模态大语言模型在具身智能环境中的弹性模态适应能力，通过涵盖昼夜不同光照条件的场景，增强了数据集的多样性与现实代表性，为自动驾驶系统的环境理解与决策推理提供了重要的基准测试平台。

当前挑战

该数据集所针对的领域挑战在于解决自动驾驶场景中的复杂视觉问答问题，要求模型能够融合异构传感器数据（如图像与点云），并执行多跳推理以回答涉及空间关系、物体属性及动态场景的复杂问题。构建过程中的挑战主要包括：原始数据中夜间场景光照条件仍较为充足，未能充分模拟极端低光照环境，因此通过降低亮度与添加高斯模糊来增强夜间数据的挑战性；同时，将问答任务从原始的分类形式转化为开放式文本生成，增加了模型需处理自由形式答案的难度，对多模态融合与语言生成能力提出了更高要求。

常用场景

经典使用场景

在自动驾驶的视觉场景理解领域，NuScenes-QA-mini数据集为多模态问答任务提供了关键基准。该数据集整合了六视角RGB图像、五维激光雷达点云及对应的文本问答对，旨在评估模型在复杂驾驶环境下的推理能力。其经典使用场景集中于训练和验证多模态大语言模型，使其能够通过多跳推理解析视觉与点云数据，例如识别特定方向的目标物体并统计其数量，从而模拟真实驾驶决策中的感知与认知过程。

解决学术问题

该数据集有效应对了自动驾驶研究中多模态融合与场景理解的挑战。它解决了传统视觉问答任务中模态单一、推理浅层的问题，通过引入激光雷达点云与多视角图像，推动模型进行跨模态深度推理。其意义在于为学术界提供了标准化的评估框架，促进了多模态大模型在具身智能领域的适应性研究，特别是在光照变化（如昼夜场景）下的鲁棒性分析，为弹性模态适配技术的创新奠定基础。

实际应用

在实际应用中，NuScenes-QA-mini数据集可服务于自动驾驶系统的智能感知模块开发。通过模拟城市道路中的多样化场景（如波士顿和新加坡的昼夜环境），该数据集帮助优化车载AI的实时决策能力，例如障碍物识别、路径规划与风险预测。其预处理后的夜间数据（如亮度降低与高斯模糊）进一步增强了系统在低光照条件下的适应性，为安全驾驶技术的商业化部署提供可靠的数据支撑。

数据集最近研究