Nexdata/Multi-angle_Lip_Multimodal_Video_Data
收藏Hugging Face2024-01-26 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Nexdata/Multi-angle_Lip_Multimodal_Video_Data
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- zh
---
# Dataset Card for Nexdata/Multi-angle_Lip_Multimodal_Video_Data
## Description
202 People - Multi-angle Lip Multimodal Video Data. The collection environments include indoor natural light scenes and indoor fluorescent lamp scenes. The device is cellphone. The diversity includes multiple scenes, different ages, 13 shooting angles. The language is Mandarin Chinese. The recording content is general field, unlimited content. The data can be used in multi-modal learning algorithms research in speech and image fields.
For more details, please refer to the link: https://www.nexdata.ai/datasets/1298?source=Huggingface
# Specifications
## Data size
202 people, each person collects the audio and video data from 13 different angles +1 txt document
## People distribution
race distribution: Asian (Indonesia), gender distribution: 89 males, 113 females, age distribution: 165 people aged 18-30, 32 people aged 31-45, and 5 people aged 46-60
## Collecting environment
indoor natural light scenes, indoor fluorescent lamp scenes
## Data diversity
including multiple scenes, different ages, different shooting angles
## Device
cellphone, the resolution is 1,920*1,080
## Collecting angle
audio and video data of front face, 3 angles left side face, 3 angles right side face, looking down, looking up, left side face down, right side face down, left side face up and right side face up all 13 different angles were collected at the same time
## Recording content
general field, unlimited content
## Language
Mandarin Chinese, each video is more than 20 seconds
## Data format
the video data format is .mp4, the audio is greater than or equal to 16KHz, 16bit, the frame rate is 25-30 fps
## Accuracy rata
the accuracy rate of sentence is more than 95%
# Licensing Information
Commercial License
提供机构:
Nexdata
原始信息汇总
数据集卡片 Nexdata/Multi-angle_Lip_Multimodal_Video_Data
描述
202人 - 多角度唇部多模态视频数据。采集环境包括室内自然光场景和室内荧光灯场景。设备为手机。多样性包括多个场景、不同年龄、13个拍摄角度。语言为普通话。录制内容为通用领域,不限内容。该数据可用于语音和图像领域的多模态学习算法研究。
规格
数据大小
202人,每人收集13个不同角度的音视频数据 + 1个txt文档
人员分布
种族分布:亚洲(印度尼西亚),性别分布:89名男性,113名女性,年龄分布:165人年龄在18-30岁,32人年龄在31-45岁,5人年龄在46-60岁
采集环境
室内自然光场景,室内荧光灯场景
数据多样性
包括多个场景、不同年龄、不同拍摄角度
设备
手机,分辨率为1,920*1,080
采集角度
正面、左侧3个角度、右侧3个角度、向下看、向上看、左侧向下、右侧向下、左侧向上和右侧向上,共13个不同角度同时采集音视频数据
录制内容
通用领域,不限内容
语言
普通话,每个视频时长超过20秒
数据格式
视频数据格式为.mp4,音频采样率大于等于16KHz,16bit,帧率为25-30 fps
准确率
句子准确率超过95%
许可信息
商业许可证



