Mandarin Audio-Visual Corpus
收藏arXiv2025-09-30 收录
下载链接:
https://lixiyun98.github.io/SA-RNN/
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是从YouTube收集的,包含大约200小时的音频数据,涉及1500位发言者。该数据集包括多通道多说话人模拟数据,具体来说,训练、验证和测试环节分别包含153800、500和1053个多通道混合音频。这些数据特征表现在单个发言中最多有三名说话人同时发言。该数据集的规模为200小时,涉及1500位发言者,旨在用于多说话人语音分离和识别任务。
This dataset is collected from YouTube, containing approximately 200 hours of audio data involving 1500 speakers. It features multi-channel multi-speaker simulated data, where the training, validation, and test splits respectively contain 153800, 500, and 1053 multi-channel mixed audio samples. A key characteristic of this dataset is that up to three speakers may speak simultaneously in a single utterance. Designed for multi-speaker speech separation and recognition tasks, this dataset has a total duration of 200 hours and covers 1500 unique speakers.



