AV16.3

Mendeley Data2024-05-10 更新2024-06-28 收录

下载链接：

https://zenodo.org/records/4449274

下载链接

链接失效反馈

官方服务：

资源简介：

Description The AV16.3 corpus is an audio-visual corpus of 43 real indoor multispeaker recordings, designed to test algorithms for audio-only, video-only and audio-visual speaker localization and tracking. Real human speakers were used. The variety of recordings was chosen to test algorithms to their limits, and to cover a wide range of applicative scenarii (meetings, surveillance). The emphasis is on overlapped speech and multiple moving speakers. Recordings include mostly dynamic scenarii, with single and multiple moving speakers. A few meeting scenarii, with mostly seated speakers, are also included. Technical details Recordings were made with two 8-microphone Uniform Circular Arrays (16 kHz sampling frequency) and three digital cameras (25 frames per second) around the meeting room, hence the "AV16.3" name. Whenever possible, lapel microphones were also worn by each speaker. All sensors were synchronized. Thus, the three cameras were calibrated and used to determine the ground-truth 3-D location of the mouth of each speaker, with a maximum error of 1.2 cm. To the best of our knowledge, this audio-visual annotated corpus was the first to be made publicly available (recorded in fall 2003, published in June 2004 at the MLMI'04 workshop). Acknowledgement "AV16.3: an Audio-Visual Corpus for Speaker Localization and Tracking", by Guillaume Lathoud, Jean-Marc Odobez and Daniel Gatica-Perez, in Proceedings of the MLMI'04 Workshop, 2004.

描述：AV16.3语料库是一个包含43段真实室内多说话人录音的音视频语料库，专为测试纯音频、纯视频及音视频融合的说话人定位与跟踪算法而构建。该语料库采用真实人类说话人录制，其录制场景的多样性旨在将算法推至性能极限，并覆盖会议、监控等多种应用场景。研究重点聚焦于重叠语音与多移动说话人场景。录制内容以动态场景为主，包含单移动说话人与多移动说话人场景；同时也包含少量以坐姿说话人为主体的会议场景。技术细节：本次录制采用布置于会议室的两套8麦克风均匀圆形阵列（Uniform Circular Arrays），采样率为16 kHz，并搭配三台帧率为25帧/秒的数码相机，"AV16.3"的命名即源于此。尽可能为每位说话人配备了领夹式麦克风（lapel microphones），且所有传感器均完成同步校准。三台相机均经过标定，可用于获取每位说话人口部的三维真值位置，最大定位误差不超过1.2 cm。据我们所知，该音视频标注语料库是首个公开可用的同类语料库，其录制工作于2003年秋季完成，并于2004年6月在MLMI'04工作坊上发表。致谢：本语料库相关论文《AV16.3：音视频说话人定位与跟踪语料库》作者为Guillaume Lathoud、Jean-Marc Odobez与Daniel Gatica-Perez，收录于2004年MLMI'04工作坊论文集。

创建时间：

2023-06-28

搜集汇总

数据集介绍

背景与挑战

背景概述

AV16.3是一个专为测试说话者定位和跟踪算法设计的音频-视频语料库，包含43个真实室内多说话者记录，特别关注重叠语音和移动说话者场景。数据集采用多麦克风阵列和同步摄像机录制，提供高精度的3D说话者位置信息，适用于会议和监控等多种应用场景。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集