VoxCeleb2音频-网格数据集
收藏arXiv2023-11-30 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2311.18168v1
下载链接
链接失效反馈官方服务:
资源简介:
VoxCeleb2音频-网格数据集是由苹果公司基于VoxCeleb2视频数据集创建的大规模配对音频-网格数据集,使用先进的单目面部重建技术处理而成。该数据集包含数千名说话者的数据,规模远超当前的公共基准数据集。数据集的创建旨在支持概率语音驱动的3D面部动画研究,特别是捕捉和模拟真实世界中语音与面部动作之间的复杂、多对多关系。该数据集的应用领域包括生成多样化的语音驱动3D面部动作,以及改进下游视听模型的性能,如在嘈杂环境中的视听语音识别任务。
The VoxCeleb2 Audio-Mesh dataset is a large-scale paired audio-mesh dataset created by Apple Inc. based on the VoxCeleb2 video dataset, processed using advanced monocular facial reconstruction techniques. It contains data from thousands of speakers, with a scale far exceeding current public benchmark datasets. The dataset was developed to support research on probabilistic speech-driven 3D facial animation, particularly in capturing and modeling the complex, many-to-many relationship between speech and facial movements in real-world scenarios. Its application scenarios include generating diverse speech-driven 3D facial movements, as well as improving the performance of downstream audiovisual models, such as audiovisual speech recognition tasks in noisy environments.
提供机构:
苹果公司
创建时间:
2023-11-30



