parler-tts/libritts_r_filtered

Name: parler-tts/libritts_r_filtered
Creator: parler-tts
Published: 2024-08-06 16:45:54
License: 暂无描述

Hugging Face2024-08-06 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/parler-tts/libritts_r_filtered

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - text-to-speech - automatic-speech-recognition language: - en size_categories: - 10K<n<100K dataset_info: - config_name: clean features: - name: audio dtype: audio: sampling_rate: 24000 - name: text_normalized dtype: string - name: text_original dtype: string - name: speaker_id dtype: string - name: path dtype: string - name: chapter_id dtype: string - name: id dtype: string splits: - name: dev.clean num_bytes: 1506311977.8882804 num_examples: 5589 - name: test.clean num_bytes: 1432099582.6705585 num_examples: 4689 - name: train.clean.100 num_bytes: 8985618654.720787 num_examples: 32215 - name: train.clean.360 num_bytes: 31794257100.91056 num_examples: 112326 download_size: 44461321972 dataset_size: 43718287316.190186 - config_name: other features: - name: audio dtype: audio: sampling_rate: 24000 - name: text_normalized dtype: string - name: text_original dtype: string - name: speaker_id dtype: string - name: path dtype: string - name: chapter_id dtype: string - name: id dtype: string splits: - name: dev.other num_bytes: 1042714063.4789225 num_examples: 4342 - name: test.other num_bytes: 1061489621.2561874 num_examples: 4716 - name: train.other.500 num_bytes: 50718457351.73659 num_examples: 194626 download_size: 54153699917 dataset_size: 52822661036.471695 configs: - config_name: clean data_files: - split: dev.clean path: clean/dev.clean-* - split: test.clean path: clean/test.clean-* - split: train.clean.100 path: clean/train.clean.100-* - split: train.clean.360 path: clean/train.clean.360-* - config_name: other data_files: - split: dev.other path: other/dev.other-* - split: test.other path: other/test.other-* - split: train.other.500 path: other/train.other.500-* pretty_name: Filtered LibriTTS-R --- # Dataset Card for Filtered LibriTTS-R This is a filtered version of [LibriTTS-R](https://huggingface.co/datasets/mythicinfinity/libritts_r). It has been filtered based on two sources: 1. LibriTTS-R paper [1], which lists samples for which speech restoration have failed 2. LibriTTS-P [2] list of [excluded speakers](https://github.com/line/LibriTTS-P/blob/main/data/excluded_spk_list.txt) for which multiple speakers have been detected. LibriTTS-R [1] is a sound quality improved version of the [LibriTTS corpus](http://www.openslr.org/60/) which is a multi-speaker English corpus of approximately 585 hours of read English speech at 24kHz sampling rate, published in 2019. ## Usage ### Example Loading the `clean` config with only the `train.clean.360` split. ```py from datasets import load_dataset load_dataset("blabble-io/libritts_r", "clean", split="train.clean.100") ``` Streaming is also supported. ```py from datasets import load_dataset load_dataset("blabble-io/libritts_r", streaming=True) ``` ### Splits There are 7 splits (dots replace dashes from the original dataset, to comply with hf naming requirements): - dev.clean - dev.other - test.clean - test.other - train.clean.100 - train.clean.360 - train.other.500 ### Configurations There are 3 configurations, each which limits the splits the `load_dataset()` function will download. The default configuration is "all". - "dev": only the "dev.clean" split (good for testing the dataset quickly) - "clean": contains only "clean" splits - "other": contains only "other" splits - "all": contains only "all" splits ### Columns ``` { "audio": datasets.Audio(sampling_rate=24_000), "text_normalized": datasets.Value("string"), "text_original": datasets.Value("string"), "speaker_id": datasets.Value("string"), "path": datasets.Value("string"), "chapter_id": datasets.Value("string"), "id": datasets.Value("string"), } ``` ### Example Row ``` { 'audio': { 'path': '/home/user/.cache/huggingface/datasets/downloads/extracted/5551a515e85b9e463062524539c2e1cb52ba32affe128dffd866db0205248bdd/LibriTTS_R/dev-clean/3081/166546/3081_166546_000028_000002.wav', 'array': ..., 'sampling_rate': 24000 }, 'text_normalized': 'How quickly he disappeared!"', 'text_original': 'How quickly he disappeared!"', 'speaker_id': '3081', 'path': '/home/user/.cache/huggingface/datasets/downloads/extracted/5551a515e85b9e463062524539c2e1cb52ba32affe128dffd866db0205248bdd/LibriTTS_R/dev-clean/3081/166546/3081_166546_000028_000002.wav', 'chapter_id': '166546', 'id': '3081_166546_000028_000002' } ``` ## Dataset Details ### Dataset Description - **License:** CC BY 4.0 ### Dataset Sources [optional]  - **Homepage:** https://www.openslr.org/141/ - **Paper:** https://arxiv.org/abs/2305.18802 ## Citation  ``` @ARTICLE{Koizumi2023-hs, title = "{LibriTTS-R}: A restored multi-speaker text-to-speech corpus", author = "Koizumi, Yuma and Zen, Heiga and Karita, Shigeki and Ding, Yifan and Yatabe, Kohei and Morioka, Nobuyuki and Bacchiani, Michiel and Zhang, Yu and Han, Wei and Bapna, Ankur", abstract = "This paper introduces a new speech dataset called ``LibriTTS-R'' designed for text-to-speech (TTS) use. It is derived by applying speech restoration to the LibriTTS corpus, which consists of 585 hours of speech data at 24 kHz sampling rate from 2,456 speakers and the corresponding texts. The constituent samples of LibriTTS-R are identical to those of LibriTTS, with only the sound quality improved. Experimental results show that the LibriTTS-R ground-truth samples showed significantly improved sound quality compared to those in LibriTTS. In addition, neural end-to-end TTS trained with LibriTTS-R achieved speech naturalness on par with that of the ground-truth samples. The corpus is freely available for download from \textbackslashurl\{http://www.openslr.org/141/\}.", month = may, year = 2023, copyright = "http://creativecommons.org/licenses/by-nc-nd/4.0/", archivePrefix = "arXiv", primaryClass = "eess.AS", eprint = "2305.18802" } ``` ``` @misc{kawamura2024librittspcorpusspeakingstyle, title={LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning}, author={Masaya Kawamura and Ryuichi Yamamoto and Yuma Shirahata and Takuya Hasumi and Kentaro Tachibana}, year={2024}, eprint={2406.07969}, archivePrefix={arXiv}, primaryClass={eess.AS}, url={https://arxiv.org/abs/2406.07969}, } ```

提供机构：

parler-tts

5,000+

优质数据集

54 个

任务类型

进入经典数据集