ViCo
收藏OpenDataLab2026-05-17 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/ViCo
下载链接
链接失效反馈官方服务:
资源简介:
ViCo数据集主要用于情景理解的视觉面部表情的生成,应用场景是在面对面的对话中产生受众反馈 (如点头、微笑)。ViCo总共涉及92个身份 (67个扬声器和76个听众) 以及483个视频和音频剪辑。它采用配对的 “说-听” 模式,听者根据说话者的声音和视频实时生成不同的态度。反应反馈 (正、中性、负)。与传统的语音到手势或说话头生成不同,收听者头生成利用来自说话者的音频和视频信号作为输入,并实时提供非语言反馈 (例如头部运动、面部表情)。该数据集支持广泛的应用程序,例如人机交互,视频到视频的翻译,跨模式的理解和生成。
The ViCo dataset is primarily developed for generating visual facial expressions for scenario understanding, with its target application being the generation of audience feedback (e.g., nodding, smiling) during face-to-face conversations. It encompasses a total of 92 unique identities (67 speakers and 76 listeners) and 483 video-audio clips. It adopts a paired 'speaker-listener' paradigm, where listeners generate real-time attitude-aligned reaction feedback (positive, neutral, negative) based on the audio and visual signals of the speakers. Unlike traditional speech-to-gesture or talking-head generation tasks, listener head generation takes audio and visual signals from the speaker as input and delivers real-time non-verbal feedback, such as head movements and facial expressions. This dataset enables a wide spectrum of applications, including human-computer interaction, video-to-video translation, cross-modal understanding and generation.
提供机构:
OpenDataLab
创建时间:
2022-10-24
搜集汇总
数据集介绍

背景与挑战
背景概述
ViCo数据集专注于情景理解的视觉面部表情生成,旨在模拟面对面对话中的受众反馈(如点头、微笑)。它包含92个身份和483个音视频剪辑,采用配对的'说-听'模式,听者根据说话者的音视频实时生成正、中性或负面的非语言反馈(如头部运动和面部表情),支持人机交互和视频翻译等应用。
以上内容由遇见数据集搜集并总结生成



