Vocal92: Multimodal Audio Dataset with a Cappella Solo Singing and Speech

Name: Vocal92: Multimodal Audio Dataset with a Cappella Solo Singing and Speech
Creator: deng, zhuo; zhou, ruohua
License: 暂无描述

IEEE2026-04-17 收录

下载链接：

https://ieee-dataport.org/documents/vocal92-multimodal-audio-dataset-cappella-solo-singing-and-speech

下载链接

链接失效反馈

官方服务：

资源简介：

We present Vocal92, a multivariate Cappella solo singing and speech audio dataset spanning around 146.73 hours sourced from volunteers. To the best of our knowledge, this is the first dataset of its kind that specifically focuses on a cappella solo singing and speech. Furthermore, we use two current state-of-the-art models to construct the singer recognition baseline system.  The dataset has a wide range of applications, including music information retrieval, singer recognition, and multimodal speaker recognition. We believe that the release of Vocal92 will be of significant interest to researchers working in these fields, as well as to the broader community of researchers working on multimodal audio processing.

我们提出了Vocal92数据集，这是一款由志愿者录制、总时长约146.73小时的多元音频数据集，涵盖无伴奏独唱（a cappella solo singing）与语音两类内容。据我们所知，这是首个专门聚焦无伴奏独唱与语音的同类数据集。此外，我们采用两款当前最先进的模型搭建了歌手识别基线系统。该数据集拥有广泛的应用场景，涵盖音乐信息检索（Music Information Retrieval）、歌手识别以及多模态说话人识别（Multimodal Speaker Recognition）等方向。我们认为，Vocal92数据集的发布将对上述领域的研究人员，以及从事多模态音频处理的广大研究者群体具有重要的研究价值。

提供机构：

deng, zhuo; zhou, ruohua