Cantonese In-car Audio-Visual Speech Recognition (CI-AVSR)

Name: Cantonese In-car Audio-Visual Speech Recognition (CI-AVSR)
Creator: 香港科技大学
Published: 2022-03-14 13:29:02
License: 暂无描述

arXiv2022-03-14 更新2024-06-21 收录

下载链接：

https://github.com/HLTCHKUST/CI-AVSR

下载链接

链接失效反馈

官方服务：

资源简介：

CI-AVSR是由香港科技大学开发的首个粤语车载音频-视觉语音识别数据集，包含200个车载命令，由30名粤语母语者录制，总计4984个样本，时长8.3小时。数据集通过添加10种常见车载背景噪音进行增强，模拟真实环境，扩大了数据集的规模和适用性。该数据集旨在支持粤语及多语言的音频-视觉语音识别任务，通过视觉信息提高语音识别质量，特别是在音频噪音环境下。

CI-AVSR is the first Cantonese in-vehicle audio-visual speech recognition dataset developed by The Hong Kong University of Science and Technology. Consisting of 200 in-vehicle commands recorded by 30 Cantonese native speakers, the dataset totals 4,984 samples with a duration of 8.3 hours. The dataset is augmented with 10 common in-vehicle background noises to simulate real-world scenarios, thereby expanding its scale and applicability. This dataset aims to support Cantonese and multilingual audio-visual speech recognition tasks, leveraging visual information to improve speech recognition quality, especially in noisy audio environments.

提供机构：

香港科技大学

创建时间：

2022-01-11

搜集汇总

数据集介绍