Parallel Audiobook Corpus

Scottish Government Open Data Portal2018-11-09 更新2026-03-28 收录

下载链接：

https://doi.org/10.7488/ds/2468

下载链接

链接失效反馈

官方服务：

资源简介：

The Parallel Audiobook Corpus (version 1.0) is a collection of parallel readings of audiobooks. The corpus consists of approximately 121 hours of speech at 22.05KHz across 4 books and 59 speakers. The data is provided in two formats. Chapter data contains the audiobook recording at the chapter level. Each chapter-level waveform is accompanied by the text and its respective word-level alignment. This format can be used if you are looking for a segmentation that does not correspond to utterance-level units. Segmented data provides a more traditional format for the corpus. The chapter-level alignment was segmented into utterances with waveforms organized by speaker. Note that, within each book, utterance identifiers are consistent across speakers, making it simple to find parallel data.

创建时间：

2018-11-09

5,000+

优质数据集

54 个

任务类型

进入经典数据集