Adversarial Generation of Voice-Controlled Speaking Face Videos Based on Modal Affine Fusion

中国科学数据2026-02-09 更新2026-04-25 收录

下载链接：

https://www.sciengine.com/AA/doi/10.19678/j.issn.1000-3428.0069992

下载链接

链接失效反馈

官方服务：

资源简介：

Generating speaking face videos from speech, involving the processing both audio and visual modalities, is a current research hotspot. A key challenge is achieving precise alignment between lip movements in the video and the input audio. To address this problem, this study proposes an end-to-end, speech-controlled speaking face video generation adversarial model, which mainly consists of a modal affine fusion-based generator, a visual quality discriminator, and a lip synchronization discriminator. The affine fusion-based generator adds audio information during face feature decoding through the Modal Affine Fusion Block (MAFBlock), effectively fuses audio information with face information and enables the audio to be better controlled for speaking face video generation. Spatial and channel attention mechanisms are incorporated to enhance the model's focus on local facial regions. The model employs a dual-discriminator strategy to enhance both visual quality and lip synchronization accuracy. The lip synchronization discriminator constrains lip movements by evaluating the similarity between the audio and the generated lip shapes without changing the overall contour and face details, thereby providing finer control over lip movement generation. The visual quality discriminator assesses the realism of the generated image frames to improve image quality. A comparative experimental analysis is conducted with several existing representative models on two audiovisual datasets. On the LRS2 validation set, the proposed model achieves an LSE-C score of 8.128 and an LSE-D score of 6.112, which are 4.3% and 4.4% higher than those of the baseline, respectively. On the LRS3 validation set, it achieves LSE-C and LSE-D scores of 7.963 and 6.259, representing improvements of 6.2% and 6.9% over the baseline scores, respectively.

创建时间：

2026-02-09

5,000+

优质数据集

54 个

任务类型

进入经典数据集