Nexdata/Multi-angle_Lip_Multimodal_Video_Data

Name: Nexdata/Multi-angle_Lip_Multimodal_Video_Data
Creator: Nexdata
Published: 2024-01-26 08:55:05
License: 暂无描述

Hugging Face2024-01-26 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/Nexdata/Multi-angle_Lip_Multimodal_Video_Data

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - zh --- # Dataset Card for Nexdata/Multi-angle_Lip_Multimodal_Video_Data ## Description 202 People - Multi-angle Lip Multimodal Video Data. The collection environments include indoor natural light scenes and indoor fluorescent lamp scenes. The device is cellphone. The diversity includes multiple scenes, different ages, 13 shooting angles. The language is Mandarin Chinese. The recording content is general field, unlimited content. The data can be used in multi-modal learning algorithms research in speech and image fields. For more details, please refer to the link: https://www.nexdata.ai/datasets/1298?source=Huggingface # Specifications ## Data size 202 people, each person collects the audio and video data from 13 different angles +1 txt document ## People distribution race distribution: Asian (Indonesia), gender distribution: 89 males, 113 females, age distribution: 165 people aged 18-30, 32 people aged 31-45, and 5 people aged 46-60 ## Collecting environment indoor natural light scenes, indoor fluorescent lamp scenes ## Data diversity including multiple scenes, different ages, different shooting angles ## Device cellphone, the resolution is 1,920*1,080 ## Collecting angle audio and video data of front face, 3 angles left side face, 3 angles right side face, looking down, looking up, left side face down, right side face down, left side face up and right side face up all 13 different angles were collected at the same time ## Recording content general field, unlimited content ## Language Mandarin Chinese, each video is more than 20 seconds ## Data format the video data format is .mp4, the audio is greater than or equal to 16KHz, 16bit, the frame rate is 25-30 fps ## Accuracy rata the accuracy rate of sentence is more than 95% # Licensing Information Commercial License

提供机构：

Nexdata

原始信息汇总

数据集卡片 Nexdata/Multi-angle_Lip_Multimodal_Video_Data

描述

202人 - 多角度唇部多模态视频数据。采集环境包括室内自然光场景和室内荧光灯场景。设备为手机。多样性包括多个场景、不同年龄、13个拍摄角度。语言为普通话。录制内容为通用领域，不限内容。该数据可用于语音和图像领域的多模态学习算法研究。

规格

数据大小

202人，每人收集13个不同角度的音视频数据 + 1个txt文档

人员分布

种族分布：亚洲（印度尼西亚），性别分布：89名男性，113名女性，年龄分布：165人年龄在18-30岁，32人年龄在31-45岁，5人年龄在46-60岁

采集环境

室内自然光场景，室内荧光灯场景

数据多样性

包括多个场景、不同年龄、不同拍摄角度

设备

手机，分辨率为1,920*1,080

采集角度

正面、左侧3个角度、右侧3个角度、向下看、向上看、左侧向下、右侧向下、左侧向上和右侧向上，共13个不同角度同时采集音视频数据

录制内容

通用领域，不限内容

语言

普通话，每个视频时长超过20秒

数据格式

视频数据格式为.mp4，音频采样率大于等于16KHz，16bit，帧率为25-30 fps

准确率

句子准确率超过95%

许可信息

商业许可证

5,000+

优质数据集

54 个

任务类型

进入经典数据集