Multi-Human Interactive Talking Dataset

DataONE2025-05-23 更新2025-11-01 收录

下载链接：

https://search.dataone.org/view/sha256:5d1074366ea100ff84da7fe92ece9d39796010ce2f7b78eca81235d05bb21b45

下载链接

链接失效反馈

官方服务：

资源简介：

Existing studies on talking video generation have predominantly focused on single-person monologues or isolated facial animations, limiting their applicability to realistic multi-human interactions. To bridge this gap, we introduce MIT, a large-scale dataset specifically designed for multi-human talking video generation. To this end, we develop an automatic pipeline that collects and annotates multi-person conversational videos. The resulting dataset comprises 12 hours of high-resolution footage, each featuring two to four speakers, with fine-grained annotations of body poses and speech interactions. It captures natural conversational dynamics in multi-speaker scenario, offering a rich resource for studying interactive visual behaviors. To demonstrate the potential of MIT, we furthur propose CovOG, a baseline model for this novel task. It integrates a Multi-Human Pose Encoder (MPE) to handle varying numbers of speakers by aggregating individual pose embeddings, and an Interactive Audio Driver (IAD) to modulate head dynamics based on speaker-specific audio features. Together, these components showcase the feasibility and challenges of generating realistic multi-human talking videos, establishing MIT as a valuable benchmark for future research. The dataset and code will be public available.

创建时间：

2025-10-29

5,000+

优质数据集

54 个

任务类型

进入经典数据集