Libri-Adapt
收藏arXiv2020-09-07 更新2024-06-21 收录
下载链接:
https://bit.ly/2UK1McG
下载链接
链接失效反馈官方服务:
资源简介:
Libri-Adapt是一个专为无监督领域自适应研究设计的大型语音数据集,由伦敦大学学院、诺基亚贝尔实验室剑桥和牛津大学联合创建。该数据集基于LibriSpeech语料库,包含7200小时的英语语音数据,覆盖72个不同领域,涉及多种录音环境和口音变体。数据集通过在不同麦克风上录制LibriSpeech-clean-100训练语料,结合三种英语口音(美国、英国和印度)和四种合成背景噪声条件(清洁、雨、风、笑声)来创建。Libri-Adapt旨在通过模拟真实世界中自动语音识别(ASR)模型面临的挑战场景,支持ASR模型的无监督领域自适应研究,特别是在麦克风硬件和软件处理管道异质性导致的领域转移问题上。
Libri-Adapt is a large-scale speech dataset specifically designed for unsupervised domain adaptation research, jointly created by University College London, Nokia Bell Labs Cambridge, and the University of Oxford. This dataset is based on the LibriSpeech corpus, contains 7200 hours of English speech data, covers 72 distinct domains, and encompasses a variety of recording environments and accent variations. The dataset is constructed by recording the LibriSpeech-clean-100 training corpus across different microphones, combined with three English accent variants (American, British, and Indian) and four synthetic background noise conditions (clean, rain, wind, and laughter). Libri-Adapt aims to support unsupervised domain adaptation research for automatic speech recognition (ASR) models by simulating challenging scenarios that ASR models encounter in real-world settings, particularly the domain shift issues caused by the heterogeneity of microphone hardware and software processing pipelines.
提供机构:
伦敦大学学院, 英国 ⋆诺基亚贝尔实验室剑桥, 英国 ‡牛津大学, 英国
创建时间:
2020-09-07



