Parallel Audiobook Corpus

Name: Parallel Audiobook Corpus
Creator: University of Edinburgh. School of Informatics
Published: 2023-04-27 17:01:15
License: 暂无描述

DataCite Commons2023-04-27 更新2025-04-17 收录

下载链接：

https://datashare.ed.ac.uk/handle/10283/3217

下载链接

链接失效反馈

官方服务：

资源简介：

The Parallel Audiobook Corpus (version 1.0) is a collection of parallel readings of audiobooks. The corpus consists of approximately 121 hours of speech at 22.05KHz across 4 books and 59 speakers. The data is provided in two formats. Chapter data contains the audiobook recording at the chapter level. Each chapter-level waveform is accompanied by the text and its respective word-level alignment. This format can be used if you are looking for a segmentation that does not correspond to utterance-level units. Segmented data provides a more traditional format for the corpus. The chapter-level alignment was segmented into utterances with waveforms organized by speaker. Note that, within each book, utterance identifiers are consistent across speakers, making it simple to find parallel data.

提供机构：

University of Edinburgh. School of Informatics

创建时间：

2018-11-09

5,000+

优质数据集

54 个

任务类型

进入经典数据集