mipal/AVATAR
收藏Hugging Face2025-11-03 更新2025-08-30 收录
下载链接:
https://hf-mirror.com/datasets/mipal/AVATAR
下载链接
链接失效反馈官方服务:
资源简介:
AVATAR是一个视频中心的音频视觉定位基准数据集,专为评估复杂动态现实世界场景中的音频视觉定位而设计。它包含5000个视频和24266个帧,提供了高分辨率的时间注释,支持四种挑战性的评估设置:单声音、混合声音、多实体和屏幕外。数据集涵盖80个音频视觉类别,适用于综合评估模型的泛化能力。
AVATAR is a video-centric audio-visual localization benchmark dataset designed for evaluating audio-visual localization in complex dynamic real-world scenarios. It consists of 5,000 videos and 24,266 frames with high-resolution temporal annotations, supporting four challenging evaluation settings: Single-sound, Mixed-sound, Multi-entity, and Off-screen. The dataset spans 80 audio-visual categories, enabling a comprehensive evaluation of model generalizability across varied audio-visual contexts.
提供机构:
mipal



