Parallel Audiobook Corpus
收藏Scottish Government Open Data Portal2018-11-09 更新2026-03-28 收录
下载链接:
https://doi.org/10.7488/ds/2468
下载链接
链接失效反馈官方服务:
资源简介:
The Parallel Audiobook Corpus (version 1.0) is a collection of parallel readings of audiobooks. The corpus consists of approximately 121 hours of speech at 22.05KHz across 4 books and 59 speakers. The data is provided in two formats. Chapter data contains the audiobook recording at the chapter level. Each chapter-level waveform is accompanied by the text and its respective word-level alignment. This format can be used if you are looking for a segmentation that does not correspond to utterance-level units. Segmented data provides a more traditional format for the corpus. The chapter-level alignment was segmented into utterances with waveforms organized by speaker. Note that, within each book, utterance identifiers are consistent across speakers, making it simple to find parallel data.
创建时间:
2018-11-09



