Mandarin Audio-Visual Corpus

arXiv2025-09-30 收录

下载链接：

https://lixiyun98.github.io/SA-RNN/

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是从YouTube收集的，包含大约200小时的音频数据，涉及1500位发言者。该数据集包括多通道多说话人模拟数据，具体来说，训练、验证和测试环节分别包含153800、500和1053个多通道混合音频。这些数据特征表现在单个发言中最多有三名说话人同时发言。该数据集的规模为200小时，涉及1500位发言者，旨在用于多说话人语音分离和识别任务。

This dataset is collected from YouTube, containing approximately 200 hours of audio data involving 1500 speakers. It features multi-channel multi-speaker simulated data, where the training, validation, and test splits respectively contain 153800, 500, and 1053 multi-channel mixed audio samples. A key characteristic of this dataset is that up to three speakers may speak simultaneously in a single utterance. Designed for multi-speaker speech separation and recognition tasks, this dataset has a total duration of 200 hours and covers 1500 unique speakers.

5,000+

优质数据集

54 个

任务类型

进入经典数据集