ViSpeR
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/YasserdahouML/visper
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个专为多语言音频视觉语音识别(AVSR)设计的综合数据集,涵盖了中文、西班牙语、阿拉伯语和法语等多种语言。它由高质量网络视频内容生成的视频-文本对组成,致力于提升数据集的多样性和人口统计学代表性,通过结合视频选择的混合方法,精选出高质量且相关的内容。该数据集在所有语言中的剪辑数量和总时长都有了大幅提升,其规模之大,旨在应对音频视觉语音识别的任务挑战。
This dataset is a comprehensive corpus specifically designed for multilingual Audio-Visual Speech Recognition (AVSR), covering multiple languages including Mandarin Chinese, Spanish, Arabic, French and others. It comprises video-text pairs generated from high-quality web video content, with the goal of enhancing the diversity and demographic representativeness of the dataset. By adopting a hybrid video selection approach, it selects high-quality and relevant content, and has achieved a substantial increase in both the number of clips and total duration across all languages. Its large scale is intended to address the task challenges of audio-visual speech recognition.
提供机构:
Research Community



