In-the-wild Conversational Dataset
收藏OpenDataLab2026-05-24 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/In-the-wild_Conversational_Dataset
下载链接
链接失效反馈官方服务:
资源简介:
由于最近的新型冠状病毒肺炎大流行,录像带采访已转移到电话会议平台,该平台具有分屏面板,主机在屏幕的一侧,受访者在屏幕的另一侧。这种设置对于研究面对面交流特别有利,因为两个人都直接面对相机。为了涵盖来自不同设置和人群的广泛表达,我们从6个YouTube频道中提取72小时视频的面部运动和音频。每个频道都有来自不同背景的大量受访者和主持人。我们利用最先进的面部表情提取方法DECA,从野外视频中恢复3D头部姿势和表情系数。DECA根据火焰3DMM估计姿势,表达和形状参数。3DMM定义了50个表达式系数以及3D钳口旋转 (dm = 53) 和以欧拉角为单位的3D头部旋转,如Sec中所述。3.1。对于音频,我们使用声源分离来隔离扬声器的声音。我们使用这些表达式、姿势和仅说话者音频作为伪地面真理来训练我们的码本 (式6) 和预测模型 (式10)。有关详细信息,请参阅Supp。我们发布了这个大规模、新颖的数据集。
Amid the recent COVID-19 pandemic, video-recorded interviews have been shifted to teleconferencing platforms featuring split-screen panels, where the host appears on one side of the screen and the interviewee on the other. This setup is particularly advantageous for studying face-to-face communication, as both individuals are positioned directly facing the camera. To capture a wide range of expressions across diverse contexts and demographic groups, we extracted facial movements and audio from 72 hours of video footage across six YouTube channels. Each channel features a large number of interviewees and hosts from varied backgrounds. We utilized the state-of-the-art facial expression extraction method DECA to recover 3D head poses and expression coefficients from in-the-wild videos. DECA estimates pose, expression, and shape parameters based on the FLAME 3DMM. The FLAME 3DMM defines 50 expression coefficients, as well as 3D jaw rotation (dm = 53) and 3D head rotations in Euler angles, as described in Section 3.1. For the audio component, we employed source separation to isolate the speaker's voice. We used these expression coefficients, head poses, and speaker-only audio as pseudo-ground truth to train our codebook (Equation 6) and prediction model (Equation 10). For further details, please refer to the Supplementary Materials. We have released this large-scale, novel dataset.
提供机构:
OpenDataLab
创建时间:
2023-02-13
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集基于COVID-19疫情期间远程视频访谈的72小时视频构建,从6个YouTube频道提取,涵盖不同背景的参与者,并利用DECA方法恢复3D头部姿势和表情系数,同时通过声源分离处理音频,作为训练模型的伪地面真值。它由首尔国立大学和加州大学伯克利分校于2022年发布。
以上内容由遇见数据集搜集并总结生成



