VOCA 3D虚拟现实人脸动画模型模与数据集，可捕捉、学习和合成3D口语风格

Name: VOCA 3D虚拟现实人脸动画模型模与数据集，可捕捉、学习和合成3D口语风格
Creator: 帕依提提
License: 暂无描述

帕依提提2024-03-04 收录

下载链接：

https://www.payititi.com/opendatasets/show-970.html

下载链接

链接失效反馈

官方服务：

资源简介：

VOCA是一个简单且通用的语音驱动面部动画框架，可跨多种身份工作。该代码库演示了如何在给定任意语音信号和静态角色网格的情况下合成真实角色动画。音频驱动的三维人脸动画已经得到了广泛的探索，但实现逼真、逼真的性能仍然是个未解之谜。这是因为缺乏可用的3D数据集、模型和标准评估指标。为了解决这个问题，我们引入了一个独特的4D人脸数据集，以60 fps的速度捕获了大约29分钟的4D扫描，并同步了12个扬声器的音频。然后，我们在我们的数据集上训练一个神经网络，从面部运动中提取身份因素。学习的模型VOCA（语音操作角色动画）将任何语音信号作为输入，甚至是英语以外语言的语音，并逼真地为各种成人面部设置动画。在训练过程中对主题标签进行调节，使模特能够学习各种真实的说话风格。VOCA还提供动画师控件，以在动画过程中改变说话风格、与身份相关的面部形状和姿势（即头部、下巴和眼球旋转）。据我们所知，VOCA是唯一一个逼真的3D人脸动画模型，可以在不进行重定目标的情况下轻松应用于看不见的对象。这使得VOCA适用于游戏中的视频、虚拟现实化身或任何事先不知道说话人、语音或语言的场景。我们将数据集和模型用于研究目的。 Free for non-commercial and scientific research purposes. By using this code, you acknowledge that you have read the license terms (https://voca.is.tue.mpg.de/license), understand them, and agree to be bound by them. If you do not agree with these terms and conditions, you must not use the code.

VOCA is a simple yet versatile speech-driven facial animation framework that works across multiple identities. This codebase demonstrates how to synthesize realistic character animations given arbitrary speech signals and static character meshes. Audio-driven 3D facial animation has been extensively explored, yet achieving realistic, lifelike performance remains an unsolved challenge. This is due to the lack of available 3D datasets, models, and standard evaluation metrics. To address this gap, we introduce a unique 4D facial dataset: it captures approximately 29 minutes of 4D scans at 60 fps, with synchronized audio from 12 speakers. We then train a neural network on our dataset to extract identity factors from facial movements. The learned model, VOCA (Voice Operated Character Animation), takes any speech signal as input—even speech in languages other than English—and animates a wide range of adult faces realistically. Conditioning on subject labels during training enables the model to learn diverse authentic speaking styles. VOCA also provides animator controls to adjust speaking styles, identity-related facial shapes, and poses (i.e., head, jaw, and eye rotations) during animation. To our knowledge, VOCA is the only realistic 3D facial animation model that can be easily applied to unseen subjects without retargeting. This makes VOCA suitable for applications such as in-game videos, virtual reality avatars, or any scenario where the speaker, speech, or language is unknown in advance. We release the dataset and model for research purposes, free for non-commercial and scientific research use. By using this code, you acknowledge that you have read the license terms (https://voca.is.tue.mpg.de/license), understand them, and agree to be bound by them. If you do not agree with these terms and conditions, you must not use the code.

提供机构：

帕依提提

搜集汇总

数据集介绍

背景与挑战

背景概述

VOCA数据集是一个专注于3D虚拟现实人脸动画的数据集，包含约29分钟的4D扫描数据，以60 fps同步12个扬声器的音频，用于训练语音驱动面部动画模型。该模型能够跨多种身份工作，支持任意语音信号输入，并允许动画师调整说话风格、面部形状和姿势，适用于游戏、虚拟现实化身等非商业和科学研究场景。

以上内容由遇见数据集搜集并总结生成