交通异常理解音视频数据集

Name: 交通异常理解音视频数据集
Creator: 华南理工大学
License: 暂无描述

国家基础学科公共科学数据中心2026-01-17 收录

下载链接：

https://nbsdc.cn/general/dataDetail?id=6967bdb7195d26230e9b11ce&type=1

下载链接

链接失效反馈

官方服务：

资源简介：

视听交通异常理解数据集（AV-TAU）是首个专为提升交通异常检测与响应能力而建设的大规模视听多模态基准数据集。该数据集由香港中文大学、华南理工大学等机构联合团队，在国家重点研发计划及香港创新及科技基金的支持下构建。其建设初衷在于填补现有交通数据集普遍缺乏音频线索的空白，解决仅凭视觉难以精准判别视线遮挡或复杂动态场景下交通事故的难题，具有重要的科研价值与应用意义。在数据产生与加工方面，项目组于2023年至2025年间，通过“车祸”、“刹车声”等关键词在YouTube、Bilibili等公开网络平台进行广泛采集。随后，组织65名专业标注人员对海量视频进行清洗与剪辑，并采用严格的“三人独立交叉验证”机制，针对异常描述、因果推理、时间定位、预防策略及应急响应五大核心任务进行精细化文本标注，确保了数据的高质量与准确性。AV-TAU数据集主要面向智能交通系统安全、自动驾驶长尾场景分析及多模态大语言模型（MLLM）训练研究需求建设，基于真实道路环境下的行车记录仪及监控摄像头数据产生，主要记录了包含视觉画面与同步音频（如碰撞声、刹车声、鸣笛声）的交通异常事件及14.9万对结构化问答标注信息，包含29,865段视频文件，数据量约38.53GB。

The Audio-Visual Traffic Anomaly Understanding Dataset (AV-TAU) is the first large-scale audio-visual multimodal benchmark dataset specifically constructed to enhance traffic anomaly detection and response capabilities. This dataset was developed by a collaborative team from institutions including The Chinese University of Hong Kong and South China University of Technology, with support from the National Key R&D Program of China and the Innovation and Technology Fund of Hong Kong. The dataset was built to fill the gap that most existing traffic datasets lack audio cues, addressing the challenge of accurately identifying traffic accidents in occluded or complex dynamic scenes solely relying on visual information, thus holding significant scientific research value and practical application significance. Regarding data generation and processing, the project team conducted extensive data collection from public online platforms such as YouTube and Bilibili between 2023 and 2025, using keywords like "car crash" and "brake sound". Subsequently, 65 professional annotators were organized to clean and clip the massive number of videos, and a strict "three-person independent cross-validation" mechanism was adopted to perform fine-grained text annotation for five core tasks: anomaly description, causal reasoning, temporal localization, prevention strategies, and emergency response, ensuring the high quality and accuracy of the dataset. The AV-TAU dataset was constructed primarily for the research needs of intelligent transportation system safety, autonomous driving long-tail scenario analysis, and multimodal large language model (MLLM) training. It is based on data from dashcams and surveillance cameras in real road environments, mainly recording traffic anomaly events with synchronized visual footage and audio (such as collision sounds, brake sounds, and horn sounds), along with 149,000 pairs of structured question-answer annotations. The dataset contains 29,865 video files with a total data volume of approximately 38.53 GB.

提供机构：

华南理工大学

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集是首个专为提升交通异常检测与响应能力而构建的大规模视听多模态基准数据集，旨在通过结合视觉画面与同步音频（如碰撞声、刹车声）来弥补现有数据在音频线索上的不足，以应对视线遮挡或复杂动态场景下的精准判别挑战。它基于真实道路环境下的行车记录仪和监控摄像头数据，包含约29,865段视频文件（约38.53GB）以及14.9万对结构化问答标注信息，主要服务于智能交通系统安全、自动驾驶分析和多模态大语言模型训练研究。

以上内容由遇见数据集搜集并总结生成