VCTK (CSTR VCTK Corpus)

Name: VCTK (CSTR VCTK Corpus)
Creator: OpenDataLab
Published: 2026-05-24 04:30:03
License: 暂无描述

OpenDataLab2026-05-24 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/VCTK

下载链接

链接失效反馈

官方服务：

资源简介：

这个 CSTR VCTK 语料库包含 110 位不同口音的英语使用者发出的语音数据。每位演讲者读出大约 400 个句子，这些句子选自报纸、彩虹段落和用于演讲口音档案的启发段落。经 Herald & Times Group 许可，报纸上的文字取自 Herald Glasgow。每个演讲者都有一组不同的报纸文本，这些文本是根据增加上下文和语音覆盖率的贪心算法选择的。文本选择算法的详细信息在以下论文中进行了描述：C. Veaux、J. Yamagishi 和 S. King，“语音库语料库：大型区域口音语音数据库的设计、收集和数据分析”，https：/ /doi.org/10.1109/ICSDA.2013.6709856。所有演讲者的彩虹段落和启发段落都是相同的。彩虹段落可以在英语档案的国际方言中找到：（http://web.ku.edu/~idea/readings/rainbow.htm）。启发段落与用于语音口音档案 (http://accent.gmu.edu) 的段落相同。可以在 http://www.ualberta.ca/~aacl2009/PDFs/WeinbergerKunath2009AACL.pdf 找到语音口音档案的详细信息。所有语音数据均使用相同的录音设置进行录音：全向麦克风 (DPA 4035) 和带宽非常宽的小型振膜电容麦克风 (Sennheiser MKH 800)，采样频率为 96kHz，24 位，半消声室为爱丁堡大学。（但是，两个扬声器 p280 和 p315 在使用 MKH 800 进行录音时存在技术问题）。所有录音都转换为 16 位，下采样到 48 kHz，并手动结束。

The CSTR VCTK Corpus contains speech data from 110 English speakers with diverse accents. Each speaker reads approximately 400 sentences selected from newspapers, rainbow passages, and elicited passages designed for speech accent archives. Newspaper texts are sourced from the Herald Glasgow, with permission from the Herald & Times Group. Each speaker has a unique set of newspaper texts, selected via a greedy algorithm that maximizes contextual and phonetic coverage. Details of the text selection algorithm are described in the following paper: C. Veaux, J. Yamagishi, and S. King, "Speech Database Corpus: Design, Collection, and Data Analysis of a Large Regional Accent Speech Database", https://doi.org/10.1109/ICSDA.2013.6709856. The rainbow passages and elicited passages are identical across all speakers. The rainbow passages are available in the International Dialects of English Archive: (http://web.ku.edu/~idea/readings/rainbow.htm). The elicited passages are the same as those used in the Speech Accent Archive (http://accent.gmu.edu). Details of the Speech Accent Archive can be found at http://www.ualberta.ca/~aacl2009/PDFs/WeinbergerKunath2009AACL.pdf. All speech data was recorded using identical setup: an omnidirectional microphone (DPA 4035) and a small-diaphragm condenser microphone with extremely wide bandwidth (Sennheiser MKH 800), at a sampling rate of 96 kHz and 24-bit depth, in a semi-anechoic chamber at the University of Edinburgh. (However, two speakers, p280 and p315, encountered technical issues during recordings using the MKH 800 microphone.) All recordings were converted to 16-bit depth, downsampled to 48 kHz, and manually trimmed.

提供机构：

OpenDataLab

创建时间：

2022-04-28

搜集汇总

数据集介绍