Ref-AVS
收藏arXiv2025-09-30 收录
下载链接:
https://gewu-lab.github.io/Ref-AVS
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为Ref-AVS,旨在为参考音频视觉分割任务构建,为对应的多模态线索表达中描述的对象提供像素级注释。它包含了48个类别中的多种可听对象和3个类别的静态对象。此外,该数据集还含有丰富的表达方式,融合了听觉、视觉和时序信息,并特别对表达难度进行了分级(简单、中等、困难)。它覆盖了广泛的对象和场景,重点关注复杂性和交互性。该数据集所针对的任务是参考音频视觉分割。
The dataset is named Ref-AVS, which is constructed for the referential audio-visual segmentation task, providing pixel-level annotations for the objects described in the corresponding multimodal cues. It includes multiple audible objects across 48 categories and static objects from 3 categories. Additionally, this dataset incorporates rich expressive modalities, integrating auditory, visual and temporal information, and specially classifies the expression difficulty into three levels: simple, medium and hard. It covers a wide range of objects and scenarios, with emphasis on complexity and interactivity. The task targeted by this dataset is referential audio-visual segmentation.
提供机构:
Gewu Lab



