202人多角度唇形多模态视频数据【数据堂】

Name: 202人多角度唇形多模态视频数据【数据堂】
Creator: shujutang
Published: 2024-05-31 15:14:31
License: 暂无描述

OpenDataLab2024-05-31 更新2024-06-01 收录

下载链接：

https://opendatalab.org.cn/shujutang/shujutang1298

下载链接

链接失效反馈

官方服务：

资源简介：

202人多角度唇形多模态视频数据。采集环境包括室内自然光线场景和室内日光灯场景。采集设备为手机。采集多样性涵盖多种场景、不同年龄、13种拍摄角度。语言为中文普通话。录制内容为通用领域，内容不限。数据可用于语音图像领域的多模态学习算法研究

A multi-modal video dataset containing lip shapes from 202 subjects with multiple viewing angles. The data was collected under two indoor lighting conditions: natural light and fluorescent light, using smartphones as the acquisition devices. The acquisition covers diverse scenarios, different age groups, and 13 shooting angles. The spoken language in the recordings is Standard Mandarin Chinese, and the recorded content belongs to the general domain without specific restrictions. This dataset can be used for research on multi-modal learning algorithms in the speech and image domains.

提供机构：

shujutang

创建时间：

2024-05-31

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集包含202人的多角度唇形多模态视频数据，每人录制13段不同角度的音视频和1个txt文档，涵盖黄种人（印度尼西亚）的性别和年龄分布，采集于室内自然光线和日光灯场景，使用手机设备，分辨率为1920*1080。数据以中文普通话录制，视频时长超过20秒，格式为.mp4，音频质量≥16KHz/16bit，帧率25-30fps，字准确率超过95%，适用于语音图像领域的多模态学习算法研究。

以上内容由遇见数据集搜集并总结生成