VVAD-LRS3数据集
收藏arXiv2021-09-28 更新2024-06-21 收录
下载链接:
https://www.kaggle.com/adrianlubitz/vvadlrs3
下载链接
链接失效反馈官方服务:
资源简介:
VVAD-LRS3数据集是由不来梅大学和德国人工智能研究中心共同创建的大规模数据集,用于视觉语音活动检测(VVAD)任务。该数据集包含超过44,000个样本,是目前最大的VVAD数据集之一。数据集内容包括面部和唇部图像,以及面部和唇部特征,这些数据来源于TED演讲视频。创建过程中,利用自动标注技术从LRS3数据集中提取样本,并进行了平衡处理,确保正负样本比例为1:1。VVAD-LRS3数据集主要应用于人机交互领域,特别是在机器人技术中,用于提高机器对人类语音活动的识别能力,从而增强人机交互的自然性和效率。
The VVAD-LRS3 dataset is a large-scale resource jointly created by the University of Bremen and the German Research Center for Artificial Intelligence for the Visual Voice Activity Detection (VVAD) task. Containing over 44,000 samples, it ranks among the largest existing VVAD datasets. The dataset encompasses facial and lip imagery as well as corresponding facial and lip features, all sourced from TED talk videos. During its development, samples were extracted from the LRS3 dataset via automatic annotation techniques, and balanced processing was implemented to maintain a 1:1 ratio between positive and negative samples. Primarily deployed in the domain of human-computer interaction (HCI), especially in robotics applications, the VVAD-LRS3 dataset serves to improve machines' capability to detect human voice activity, thus enhancing the naturalness and efficiency of human-computer interactions.
提供机构:
不来梅大学计算机科学系,德国不来梅人工智能研究中心机器人创新中心
创建时间:
2021-09-28



