WildVVAD
收藏arXiv2020-10-16 更新2024-06-21 收录
下载链接:
https://team.inria.fr/perception/research/vvad/
下载链接
链接失效反馈官方服务:
资源简介:
WildVVAD数据集是由法国国家信息与自动化研究所格勒诺布尔-阿尔卑斯分部和格勒诺布尔阿尔卑斯大学创建的,旨在通过视觉特征预测人是否在说话。该数据集包含13,000个两秒的视频,涵盖了多种头部姿态、分辨率和视频质量,以及不同的种族、年龄和性别。数据集的创建过程涉及自动化的数据采集和标注,结合音频语音活动检测和面部检测技术。WildVVAD数据集主要应用于视觉语音活动检测领域,特别是在音频信号难以分析或缺失的情况下,提供了一种有效的解决方案。
The WildVVAD dataset was developed by the Grenoble-Alpes Branch of the French National Institute for Research in Computer Science and Automation (INRIA) and Grenoble Alpes University, with the goal of predicting whether a person is speaking using visual features. This dataset comprises 13,000 two-second video clips, covering diverse head poses, resolutions, video quality levels, as well as varied ethnicities, age groups and genders. The dataset construction process involves automated data collection and annotation, integrating audio-based voice activity detection (VAD) and face detection technologies. The WildVVAD dataset is primarily utilized in the field of visual voice activity detection, offering an effective solution particularly in scenarios where audio signals are difficult to analyze or missing.
提供机构:
法国国家信息与自动化研究所格勒诺布尔-阿尔卑斯分部和格勒诺布尔阿尔卑斯大学
创建时间:
2020-09-23



