Learning realistic lip motions for humanoid face robots
收藏DataCite Commons2026-01-28 更新2026-04-25 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.j6q573nrc
下载链接
链接失效反馈官方服务:
资源简介:
Lip motion represents outsized importance in human communication,
capturing nearly half of our visual attention during conversation. Yet
anthropomorphic robots often fail to achieve lip-audio synchronization,
resulting in clumsy and lifeless lip behaviors. Two fundamental barriers
underlay this challenge. First, robotic lips typically lack the mechanical
complexity required to reproduce nuanced human mouth movements; second,
existing synchronization methods depend on manually predefined movements
and rules, restricting adaptability and realism. Here, we present a
humanoid robot face designed to overcome these limitations, featuring soft
silicone lips actuated by a ten-degree-of-freedom (10-DoF) mechanism. To
achieve lip synchronization without predefined movements, we use a
self-supervised learning pipeline based on a Variational Autoencoder (VAE)
combined with a Facial Action Transformer, enabling the robot to
autonomously infer more realistic lip trajectories directly from speech
audio. Our experimental results suggest that this method outperforms
simple heuristics like amplitude-based baselines in achieving more
visually coherent lip-audio synchronization. Furthermore, the learned
synchronization successfully generalizes across multiple linguistic
contexts, enabling robot speech articulation in ten languages unseen
during training.
提供机构:
Dryad
创建时间:
2026-01-07



