Parallel Audiobook Corpus
收藏DataCite Commons2023-04-27 更新2025-04-17 收录
下载链接:
https://datashare.ed.ac.uk/handle/10283/3217
下载链接
链接失效反馈官方服务:
资源简介:
The Parallel Audiobook Corpus (version 1.0) is a collection of parallel readings of audiobooks. The corpus consists of approximately 121 hours of speech at 22.05KHz across 4 books and 59 speakers. The data is provided in two formats. Chapter data contains the audiobook recording at the chapter level. Each chapter-level waveform is accompanied by the text and its respective word-level alignment. This format can be used if you are looking for a segmentation that does not correspond to utterance-level units. Segmented data provides a more traditional format for the corpus. The chapter-level alignment was segmented into utterances with waveforms organized by speaker. Note that, within each book, utterance identifiers are consistent across speakers, making it simple to find parallel data.
提供机构:
University of Edinburgh. School of Informatics
创建时间:
2018-11-09



