Czech Audio-Visual Speech Corpus for Recognition with Impaired Conditions
收藏catalogue.elra.info2008-11-05 更新2025-03-27 收录
下载链接:
https://catalogue.elra.info/en-us/repository/browse/ELRA-S0284/
下载链接
链接失效反馈官方服务:
资源简介:
This is an audio-visual speech database for training and testing of Czech audio-visual continuous speech recognition systems collected with impaired illumination conditions. The corpus consists of about 20 hours of audio-visual records of 50 speakers in laboratory conditions. Recorded subjects were instructed to remain static. The illumination varied and chunks of each speaker were recorded with several different conditions, such as full illumination, or illumination from one side (left or right) only. These conditions make the database usable for training lip-/head-tracking systems under various illumination conditions independently of the language. Speakers were asked to read 200 sentences each (50 common for all speakers and 150 specific to each speaker). The average total length of recording per speaker was 23 minutes.Acoustic data are stored in wave files using PCM format, sampling frequency 44kHz, resolution 16 bits. Each speaker’s acoustic data set represents about 180 MB of disk space (about 8.8 GB).Visual data are stored in video files (.avi format) using the digital video (DV) codec. Visual data per speaker take about 3.7 GB of disk (about 185 GB as a whole) and are stored on an IDE hard disk (NTFS format).
本数据集为一项针对捷克语音频-视觉连续语音识别系统训练与测试的音频-视觉语音数据库,该数据库在受损光照条件下收集。数据集包含约20小时的音频-视觉记录,涉及50名演讲者在实验室条件下的录音。录音对象被要求保持静止。光照条件多变,每位演讲者的录音片段均以多种不同条件进行记录,例如全光照或仅一侧(左侧或右侧)光照。这些条件使得数据库可适用于独立于语言进行各种光照条件下唇部/头部跟踪系统的训练。演讲者被要求朗读200句句子(其中50句适用于所有演讲者,150句针对每位演讲者特定)。每位演讲者的平均录音总时长为23分钟。音频数据以PCM格式存储于波形文件中,采样频率为44kHz,分辨率为16位。每位演讲者的音频数据集约占用180MB的磁盘空间(总计约8.8GB)。视觉数据以.avi格式存储于视频文件中,使用数字视频(DV)编解码器。每位演讲者的视觉数据约占3.7GB的磁盘空间(总计约185GB),存储于IDE硬盘上(NTFS格式)。
提供机构:
catalogue.elra.info



