3D-Speaker

Name: 3D-Speaker
Creator: 达摩院阿里巴巴集团
Published: 2023-09-25 10:36:41
License: 暂无描述

arXiv2023-09-25 更新2024-06-21 收录

下载链接：

https://3dspeaker.github.io/

下载链接

链接失效反馈

官方服务：

资源简介：

3D-Speaker是一个大规模的语音语料库，由达摩院阿里巴巴集团创建，旨在促进语音表示解耦的研究。该数据集包含超过10,000名说话者，每位说话者的语音由多种设备在不同距离下录制，部分说话者使用多种方言。数据集通过控制多维音频数据的组合，形成了一个多样化的语音表示混杂矩阵，激励了有趣的解耦方法。3D-Speaker的跨领域特性使其成为评估大型通用语音模型和实验域外学习和自监督学习方法的合适资源。此外，3D-Speaker是公开可访问的语料库中说话者数量最多的，可用于提高说话者验证系统和其他语音相关任务的性能。

3D-Speaker is a large-scale speech corpus developed by DAMO Academy of Alibaba Group, aiming to advance research on speech representation disentanglement. This dataset encompasses over 10,000 speakers, with their speech recorded by diverse devices at varying distances, and some speakers using multiple dialects. By controlling the combination of multi-dimensional audio data, the dataset constructs a diverse mixed matrix of speech representations, which inspires innovative disentanglement methods. The cross-domain attribute of 3D-Speaker renders it a suitable resource for evaluating large-scale general-purpose speech models and conducting experiments on out-of-distribution learning and self-supervised learning approaches. Furthermore, 3D-Speaker has the largest number of speakers among all publicly available speech corpora, and can be utilized to enhance the performance of speaker verification systems and other speech-related tasks.

提供机构：

达摩院阿里巴巴集团

创建时间：

2023-06-27

5,000+

优质数据集

54 个

任务类型

进入经典数据集