libritts_r
收藏魔搭社区2026-05-19 更新2025-04-19 收录
下载链接:
https://modelscope.cn/datasets/pengzhendong/libritts_r
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for LibriTTS-R
<!-- Provide a quick summary of the dataset. -->
LibriTTS-R [1] is a sound quality improved version of the LibriTTS corpus
(http://www.openslr.org/60/) which is a multi-speaker English corpus of approximately
585 hours of read English speech at 24kHz sampling rate, published in 2019.
## Overview
This is the LibriTTS-R dataset, adapted for the `datasets` library.
## Usage
### Splits
There are 7 splits (dots replace dashes from the original dataset, to comply with hf naming requirements):
- dev.clean
- dev.other
- test.clean
- test.other
- train.clean.100
- train.clean.360
- train.other.500
### Configurations
There are 3 configurations, each which limits the splits the `load_dataset()` function will download.
The default configuration is "all".
- "dev": only the "dev.clean" split (good for testing the dataset quickly)
- "clean": contains only "clean" splits
- "other": contains only "other" splits
- "all": contains only "all" splits
### Example
Loading the `clean` config with only the `train.clean.360` split.
```
load_dataset("blabble-io/libritts_r", "clean", split="train.clean.100")
```
Streaming is also supported.
```
load_dataset("blabble-io/libritts_r", streaming=True)
```
### Columns
```
{
"audio": datasets.Audio(sampling_rate=24_000),
"text_normalized": datasets.Value("string"),
"text_original": datasets.Value("string"),
"speaker_id": datasets.Value("string"),
"path": datasets.Value("string"),
"chapter_id": datasets.Value("string"),
"id": datasets.Value("string"),
}
```
### Example Row
```
{
'audio': {
'path': '/home/user/.cache/huggingface/datasets/downloads/extracted/5551a515e85b9e463062524539c2e1cb52ba32affe128dffd866db0205248bdd/LibriTTS_R/dev-clean/3081/166546/3081_166546_000028_000002.wav',
'array': ...,
'sampling_rate': 24000
},
'text_normalized': 'How quickly he disappeared!"',
'text_original': 'How quickly he disappeared!"',
'speaker_id': '3081',
'path': '/home/user/.cache/huggingface/datasets/downloads/extracted/5551a515e85b9e463062524539c2e1cb52ba32affe128dffd866db0205248bdd/LibriTTS_R/dev-clean/3081/166546/3081_166546_000028_000002.wav',
'chapter_id': '166546',
'id': '3081_166546_000028_000002'
}
```
## Dataset Details
### Dataset Description
- **License:** CC BY 4.0
### Dataset Sources [optional]
<!-- Provide the basic links for the dataset. -->
- **Homepage:** https://www.openslr.org/141/
- **Paper:** https://arxiv.org/abs/2305.18802
## Citation
<!-- If there is a paper or blog post introducing the dataset, the APA and Bibtex information for that should go in this section. -->
```
@ARTICLE{Koizumi2023-hs,
title = "{LibriTTS-R}: A restored multi-speaker text-to-speech corpus",
author = "Koizumi, Yuma and Zen, Heiga and Karita, Shigeki and Ding,
Yifan and Yatabe, Kohei and Morioka, Nobuyuki and Bacchiani,
Michiel and Zhang, Yu and Han, Wei and Bapna, Ankur",
abstract = "This paper introduces a new speech dataset called
``LibriTTS-R'' designed for text-to-speech (TTS) use. It is
derived by applying speech restoration to the LibriTTS
corpus, which consists of 585 hours of speech data at 24 kHz
sampling rate from 2,456 speakers and the corresponding
texts. The constituent samples of LibriTTS-R are identical
to those of LibriTTS, with only the sound quality improved.
Experimental results show that the LibriTTS-R ground-truth
samples showed significantly improved sound quality compared
to those in LibriTTS. In addition, neural end-to-end TTS
trained with LibriTTS-R achieved speech naturalness on par
with that of the ground-truth samples. The corpus is freely
available for download from
\textbackslashurl\{http://www.openslr.org/141/\}.",
month = may,
year = 2023,
copyright = "http://creativecommons.org/licenses/by-nc-nd/4.0/",
archivePrefix = "arXiv",
primaryClass = "eess.AS",
eprint = "2305.18802"
}
```
# LibriTTS-R 数据集卡片
<!-- 提供数据集的快速摘要。 -->
LibriTTS-R [1] 是LibriTTS语料库(http://www.openslr.org/60/)的音质增强版本,该语料库为2019年发布的多说话人英语语料库,包含约585小时的24kHz采样率朗读英语语音数据。
## 概览
本数据集为适配`datasets`库的LibriTTS-R数据集。
## 使用方法
### 数据拆分
共有7种数据拆分(为符合Hugging Face命名规范,将原数据集中的横杠替换为点):
- dev.clean
- dev.other
- test.clean
- test.other
- train.clean.100
- train.clean.360
- train.other.500
### 配置项
共提供3种配置,每种配置会限制`load_dataset()`函数需下载的数据拆分。默认配置为`"all"`。
- `"dev"`:仅包含`"dev.clean"`拆分(适合快速测试数据集)
- `"clean"`:仅包含所有`"clean"`类拆分
- `"other"`:仅包含所有`"other"`类拆分
- `"all"`:包含全部数据拆分
### 示例
加载仅包含`train.clean.360`拆分的`clean`配置的代码示例:
load_dataset("blabble-io/libritts_r", "clean", split="train.clean.100")
同时支持流式加载:
load_dataset("blabble-io/libritts_r", streaming=True)
### 数据字段
{
"audio": datasets.Audio(sampling_rate=24_000),
"text_normalized": datasets.Value("string"),
"text_original": datasets.Value("string"),
"speaker_id": datasets.Value("string"),
"path": datasets.Value("string"),
"chapter_id": datasets.Value("string"),
"id": datasets.Value("string"),
}
### 示例数据行
{
'audio': {
'path': '/home/user/.cache/huggingface/datasets/downloads/extracted/5551a515e85b9e463062524539c2e1cb52ba32affe128dffd866db0205248bdd/LibriTTS_R/dev-clean/3081/166546/3081_166546_000028_000002.wav',
'array': ...,
'sampling_rate': 24000
},
'text_normalized': 'How quickly he disappeared!"',
'text_original': 'How quickly he disappeared!"',
'speaker_id': '3081',
'path': '/home/user/.cache/huggingface/datasets/downloads/extracted/5551a515e85b9e463062524539c2e1cb52ba32affe128dffd866db0205248bdd/LibriTTS_R/dev-clean/3081/166546/3081_166546_000028_000002.wav',
'chapter_id': '166546',
'id': '3081_166546_000028_000002'
}
## 数据集详情
### 数据集描述
- **授权协议**:CC BY 4.0
### 数据集来源 [可选]
<!-- 提供数据集的基础链接。 -->
- **主页**:https://www.openslr.org/141/
- **论文**:https://arxiv.org/abs/2305.18802
## 引用信息
<!-- 若该数据集有相关介绍论文,需在此处附上APA格式及BibTeX格式的引用信息。 -->
@ARTICLE{Koizumi2023-hs,
title = "{LibriTTS-R}: 一款经过音质修复的多说话人文本转语音语料库",
author = "Koizumi, Yuma and Zen, Heiga and Karita, Shigeki and Ding,
Yifan and Yatabe, Kohei and Morioka, Nobuyuki and Bacchiani,
Michiel and Zhang, Yu and Han, Wei and Bapna, Ankur",
abstract = "本文介绍了一款专为文本转语音(Text-to-Speech, TTS)场景设计的新型语音数据集LibriTTS-R。该数据集通过对LibriTTS语料库应用语音修复技术得到,后者包含来自2456名说话人的585小时24kHz采样率语音数据及对应文本。LibriTTS-R的样本组成与LibriTTS完全一致,仅音质得到提升。实验结果表明,相较于LibriTTS,LibriTTS-R的真实样本音质有显著提升。此外,基于LibriTTS-R训练的神经端到端TTS模型,其生成语音的自然度可媲美真实语音样本。该语料库可在https://www.openslr.org/141/免费下载。",
month = may,
year = 2023,
copyright = "http://creativecommons.org/licenses/by-nc-nd/4.0/",
archivePrefix = "arXiv",
primaryClass = "eess.AS",
eprint = "2305.18802"
}
提供机构:
maas
创建时间:
2025-04-16



