Nexdata | Lip Multimodal Data | 2,000 ID |Lip Sync Data |Audio Image AI & ML Training Data | Annotated Imagery Data

Datarade2024-04-19 收录

下载链接：

https://datarade.ai/data-products/nexdata-lip-multimodal-data-2-000-id-audio-image-ai-ml-nexdata

下载链接

链接失效反馈

官方服务：

资源简介：

1. Specifications Data size : 2,000 id, each person collects the audio and video data from 13 different angles +1 txt document People distribution : race distribution: Asian, Caucasian, Black, Brown, gender distribution: gender balance, age distribution: people aged 18-60 Collecting environment : indoor natural light scenes, indoor fluorescent lamp scenes Annotated Imagery Data diversity : including multiple scenes, different ages, different shooting angles Device : cellphone, the resolution is 1,920*1,080 Collecting angle : audio and video data of front face, 3 angles left side face, 3 angles right side face, looking down, looking up, left side face down, right side face down, left side face up and right side face up all 13 different angles were collected at the same time Recording content : general field, unlimited content Language : 10 languages, each video is more than 20 seconds Data format : the video data format is .mp4, the audio is greater than or equal to 16KHz, 16bit, the frame rate is 25-30 fps Accuracy rata : the accuracy rate of sentence is more than 95% 2. About Nexdata Nexdata owns off-the-shelf 200,000 hours of speech recognition data, 800TB of Annotated Imagery Data, about 2 billion pieces of Natural Language Processing (NLP) Data. These ready-to-go Annotated Imagery Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/computerVisionTraining?source=Datarade

1. 数据集规格 - 数据规模：共2000条带唯一标识的样本，每名受访者采集自身13个不同视角的音视频数据，附带1份文本文档。 - 人群分布：种族覆盖亚洲人、高加索人、黑人、棕色人种；性别比例均衡；年龄区间为18至60岁。 - 采集环境：涵盖室内自然光场景与室内荧光灯场景。 - 标注图像数据（Annotated Imagery Data）多样性：覆盖多场景、不同年龄段与多样化拍摄视角。 - 采集设备：采用手机，分辨率为1920×1080。 - 采集视角：同步采集正面面部、左侧面部3个角度、右侧面部3个角度、俯视、仰视、左侧面部俯视、右侧面部俯视、左侧面部仰视及右侧面部仰视，合计13种不同视角的音视频数据。 - 录制内容：通用场景，内容无限制。 - 语言覆盖：包含10种语言，单条视频时长超过20秒。 - 数据格式：视频格式为.mp4；音频采样率不低于16kHz、采样精度为16bit；帧率为25~30fps。 - 标注准确率：语句标注准确率不低于95%。 2. 关于Nexdata Nexdata拥有现成的20万小时语音识别数据、800TB标注图像数据，以及约20亿条自然语言处理（Natural Language Processing, NLP）数据。这批可直接投入使用的标注图像数据支持即时交付，能够快速提升AI模型的准确率。如需了解更多详情，请访问：https://www.nexdata.ai/computerVisionTraining?source=Datarade

提供机构：

Nexdata

5,000+

优质数据集

54 个

任务类型

进入经典数据集