DADA2000

Name: DADA2000
Creator: OpenDataLab
Published: 2026-05-17 04:30:34
License: 暂无描述

OpenDataLab2026-05-17 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/DADA2000

下载链接

链接失效反馈

官方服务：

资源简介：

驾驶员注意力预测正成为类人驾驶系统中必不可少的研究问题。这项工作试图预测驾驶事故场景 (DADA) 中驾驶员的注意力。然而，由于动态的交通场景，复杂而不平衡的事故类别，挑战紧随其后。在这项工作中，我们设计了一个语义上下文诱导的注意融合网络 (SCAFNet)。我们首先将RGB视频帧分割成具有不同语义区域 (即语义图像) 的图像，其中每个区域表示场景的一个语义类别 (例如，道路、树木等)，并同时学习两条平行路径中RGB帧和语义图像的时空特征。然后，通过注意力融合网络将学习到的特征融合在一起，以找到驾驶员注意力预测中语义引起的场景变化。贡献是三倍。1) 利用语义图像，引入其语义上下文特征，并验证其对驾驶员注意力预测的明显提升效果，其中语义上下文特征是通过图卷积网络 (GCN) 对语义图像进行建模的；2) 以专注策略融合语义图像的语义上下文特征和RGB帧的特征，并通过卷积LSTM模块将融合的细节在帧上转移，以获得每个视频帧的注意图，并考虑历史场景在驾驶情况下的变化；3) 在我们先前收集的数据集 (称为DADA-2000) 和其他两个具有挑战性的数据集上，使用最新方法评估了所提出方法的优越性。

Driver attention prediction has become an indispensable research topic in human-like driving systems. This work focuses on predicting driver attention in driving accident scenarios (DADA). However, challenges arise accordingly due to dynamic traffic scenes, complex and imbalanced accident categories. In this work, we propose a Semantic Context-induced Attention Fusion Network (SCAFNet). We first split RGB video frames into images with distinct semantic regions (i.e., semantic images), where each region represents a semantic category of the scene (e.g., road, trees, etc.), and simultaneously learn the spatio-temporal features of RGB frames and semantic images through two parallel pathways. Then, the learned features are fused via the attention fusion network to identify semantic-induced scene changes for driver attention prediction. The contributions are three-fold: 1) Leveraging semantic images, we introduce their semantic context features, and verify that they significantly boost the performance of driver attention prediction, where the semantic context features are modeled for semantic images via Graph Convolutional Network (GCN); 2) We fuse the semantic context features of semantic images and the features of RGB frames using an attention-focused fusion strategy, and propagate the fused details across frames with a convolutional LSTM module to generate the attention map for each video frame, while accounting for the changes of historical scenarios in driving contexts; 3) We evaluate the superiority of the proposed method against state-of-the-art approaches on our previously collected dataset (named DADA-2000) and two other challenging datasets.

提供机构：

OpenDataLab

创建时间：

2022-11-18

搜集汇总

数据集介绍