five

RealVAD: A Real-world Dataset for Voice Activity Detection

收藏
Mendeley Data2024-03-27 更新2024-06-28 收录
下载链接:
https://zenodo.org/record/3928151
下载链接
链接失效反馈
官方服务:
资源简介:
RealVAD: A Real-world Dataset for Voice Activity Detection The task of automatically detecting “Who is Speaking and When” is broadly named as Voice Activity Detection (VAD). Automatic VAD is a very important task and also the foundation of several domains, e.g., human-human, human-computer/ robot/ virtual-agent interaction analyses, and industrial applications. RealVAD dataset is constructed from a YouTube video composed of a panel discussion lasting approx. 83 minutes. The audio is available from a single channel. There is one static camera capturing all panelists, the moderator and audiences. Particular aspects of RealVAD dataset are: It is composed of panelists with different nationalities (British, Dutch, French, German, Italian, American, Mexican, Columbian, Thai). This aspect allows studying the effect of ethnic origin variety to the automatic VAD. There is a gender balance such that there are four female and five male panelists. The panelists are sitting in two rows and they can be gazing audience, other panelists, their laptop, the moderator or anywhere in the room while speaking or not-speaking. Therefore, they were captured not only from frontal-view but also from side-view varying based on their instant posture and head orientation. The panelists are moving freely and are doing various spontaneous actions (e.g., drinking water, checking their cell phone, using their laptop, etc.), resulting in different postures. The panelists’ body parts are sometimes partially occluded by their/other's body part or belongings (e.g., laptop). There are also natural changes of illumination and shadow rising on the wall behind the panelists in the back row. Especially, for the panelists sitting in the front row, there is sometimes background motion occurring when the person(s) behind them moves. The annotations includes: The upper body detection of nine panelists in bounding box form. Associated VAD ground-truth (speaking, not-speaking) for nine panelists. Acoustic features extracted from the video: MFCC and raw filterbank energies. All info regarding the annotations are given in the ReadMe.txt and Acoustic Features README.txt files. When using this dataset for your research, please cite the following paper in your publication: C. Beyan, M. Shahid and V. Murino, "RealVAD: A Real-world Dataset and A Method for Voice Activity Detection by Body Motion Analysis", in IEEE Transactions on Multimedia, 2020.
创建时间:
2023-06-28
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
RealVAD是一个用于语音活动检测的真实世界数据集,基于约83分钟的YouTube小组讨论视频构建,包含9位不同国籍和性别平衡的小组成员。该数据集的特点是真实场景下的多样性,包括多种姿势、遮挡、光照变化和背景运动,并提供了上体检测边界框、VAD真值以及声学特征注释,适用于研究多民族背景下的自动语音活动检测。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作