Sony-TAu Realistic Spatial Soundscapes 2023 (STARSS23)
收藏arXiv2023-11-14 更新2024-06-21 收录
下载链接:
https://zenodo.org/record/7880637
下载链接
链接失效反馈官方服务:
资源简介:
STARSS23是由索尼集团和坦佩雷大学创建的音频-视觉数据集,包含多通道音频数据、视频数据以及声音事件的时空标注。该数据集记录了16个房间中57名参与者的活动,总时长超过7小时,旨在支持音频-视觉声音事件定位和检测任务。数据集中的声音场景通过通用指令引导参与者确保声音事件的充分活动和发生。STARSS23还提供了人工标注的时间激活标签和人工确认的方向到达标签,基于运动捕捉系统的跟踪结果。该数据集适用于评估音频-视觉SELD系统,并展示了视觉对象位置在音频-视觉SELD任务中的益处。
STARSS23 is an audio-visual dataset developed by Sony Group Corporation and Tampere University, which encompasses multi-channel audio data, video data, and spatio-temporal annotations for sound events. This dataset records activities from 57 participants across 16 rooms, with a total duration of over 7 hours, and is designed to support audio-visual sound event localization and detection (SELD) tasks. The sound scenes in the dataset are constructed by guiding participants with general instructions to ensure sufficient occurrence and activity of sound events. STARSS23 also provides manually annotated temporal activation labels and manually verified direction-of-arrival (DoA) labels derived from the tracking results of motion capture systems. This dataset is suitable for evaluating audio-visual SELD systems and demonstrates the benefits of visual object localization in audio-visual SELD tasks.
提供机构:
索尼集团和坦佩雷大学
创建时间:
2023-06-15



