AVSBench-Object

arXiv2025-09-30 收录

下载链接：

https://gewu-lab.github.io/stepping_stones/

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为AVSBench-Object，分为两个子集，分别针对音频视觉分割的S4和MS3子任务，提供了视频中对声音源像素级的二进制标签。此外，数据集还为V2子集额外提供了语义标签，涵盖了71个类别，包括人类、乐器、动物和工具等。在规模上，S4子集包含4,932个视频（其中3,452个用于训练，740个用于验证，740个用于测试），而MS3子集包含424个视频（其中286个用于训练，64个用于验证，64个用于测试）。该数据集的任务是进行音频视觉分割。

The dataset is named AVSBench-Object, which comprises two subsets respectively targeting the S4 and MS3 subtasks of audio-visual segmentation, providing pixel-level binary labels for sound sources in videos. Additionally, the V2 subset of this dataset provides extra semantic labels covering 71 categories including humans, musical instruments, animals, tools and so on. In terms of scale, the S4 subset contains 4,932 videos (3,452 for training, 740 for validation, and 740 for testing), while the MS3 subset includes 424 videos with 286 for training, 64 for validation and 64 for testing respectively. The core task of this dataset is audio-visual segmentation.

搜集汇总

数据集介绍

背景与挑战

背景概述

AVSBench-Object数据集专注于音频-视觉语义分割任务，旨在实现声源的像素级定位和场景语义理解。该数据集通过Stepping Stones训练策略和AAVS框架，在AVSBench基准测试中取得了领先的性能。

以上内容由遇见数据集搜集并总结生成