ViSpeR

Name: ViSpeR
Creator: Research Community
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/YasserdahouML/visper

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个专为多语言音频视觉语音识别（AVSR）设计的综合数据集，涵盖了中文、西班牙语、阿拉伯语和法语等多种语言。它由高质量网络视频内容生成的视频-文本对组成，致力于提升数据集的多样性和人口统计学代表性，通过结合视频选择的混合方法，精选出高质量且相关的内容。该数据集在所有语言中的剪辑数量和总时长都有了大幅提升，其规模之大，旨在应对音频视觉语音识别的任务挑战。

This dataset is a comprehensive corpus specifically designed for multilingual Audio-Visual Speech Recognition (AVSR), covering multiple languages including Mandarin Chinese, Spanish, Arabic, French and others. It comprises video-text pairs generated from high-quality web video content, with the goal of enhancing the diversity and demographic representativeness of the dataset. By adopting a hybrid video selection approach, it selects high-quality and relevant content, and has achieved a substantial increase in both the number of clips and total duration across all languages. Its large scale is intended to address the task challenges of audio-visual speech recognition.

提供机构：

Research Community

5,000+

优质数据集

54 个

任务类型

进入经典数据集