VoxCeleb2音频-网格数据集

Name: VoxCeleb2音频-网格数据集
Creator: 苹果公司
Published: 2023-11-30 09:14:43
License: 暂无描述

arXiv2023-11-30 更新2024-08-06 收录

下载链接：

http://arxiv.org/abs/2311.18168v1

下载链接

链接失效反馈

官方服务：

资源简介：

VoxCeleb2音频-网格数据集是由苹果公司基于VoxCeleb2视频数据集创建的大规模配对音频-网格数据集，使用先进的单目面部重建技术处理而成。该数据集包含数千名说话者的数据，规模远超当前的公共基准数据集。数据集的创建旨在支持概率语音驱动的3D面部动画研究，特别是捕捉和模拟真实世界中语音与面部动作之间的复杂、多对多关系。该数据集的应用领域包括生成多样化的语音驱动3D面部动作，以及改进下游视听模型的性能，如在嘈杂环境中的视听语音识别任务。

The VoxCeleb2 Audio-Mesh dataset is a large-scale paired audio-mesh dataset created by Apple Inc. based on the VoxCeleb2 video dataset, processed using advanced monocular facial reconstruction techniques. It contains data from thousands of speakers, with a scale far exceeding current public benchmark datasets. The dataset was developed to support research on probabilistic speech-driven 3D facial animation, particularly in capturing and modeling the complex, many-to-many relationship between speech and facial movements in real-world scenarios. Its application scenarios include generating diverse speech-driven 3D facial movements, as well as improving the performance of downstream audiovisual models, such as audiovisual speech recognition tasks in noisy environments.

提供机构：

苹果公司

创建时间：

2023-11-30