five

parler-tts/libritts_r_filtered

收藏
Hugging Face2024-08-06 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/parler-tts/libritts_r_filtered
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - text-to-speech - automatic-speech-recognition language: - en size_categories: - 10K<n<100K dataset_info: - config_name: clean features: - name: audio dtype: audio: sampling_rate: 24000 - name: text_normalized dtype: string - name: text_original dtype: string - name: speaker_id dtype: string - name: path dtype: string - name: chapter_id dtype: string - name: id dtype: string splits: - name: dev.clean num_bytes: 1506311977.8882804 num_examples: 5589 - name: test.clean num_bytes: 1432099582.6705585 num_examples: 4689 - name: train.clean.100 num_bytes: 8985618654.720787 num_examples: 32215 - name: train.clean.360 num_bytes: 31794257100.91056 num_examples: 112326 download_size: 44461321972 dataset_size: 43718287316.190186 - config_name: other features: - name: audio dtype: audio: sampling_rate: 24000 - name: text_normalized dtype: string - name: text_original dtype: string - name: speaker_id dtype: string - name: path dtype: string - name: chapter_id dtype: string - name: id dtype: string splits: - name: dev.other num_bytes: 1042714063.4789225 num_examples: 4342 - name: test.other num_bytes: 1061489621.2561874 num_examples: 4716 - name: train.other.500 num_bytes: 50718457351.73659 num_examples: 194626 download_size: 54153699917 dataset_size: 52822661036.471695 configs: - config_name: clean data_files: - split: dev.clean path: clean/dev.clean-* - split: test.clean path: clean/test.clean-* - split: train.clean.100 path: clean/train.clean.100-* - split: train.clean.360 path: clean/train.clean.360-* - config_name: other data_files: - split: dev.other path: other/dev.other-* - split: test.other path: other/test.other-* - split: train.other.500 path: other/train.other.500-* pretty_name: Filtered LibriTTS-R --- # Dataset Card for Filtered LibriTTS-R This is a filtered version of [LibriTTS-R](https://huggingface.co/datasets/mythicinfinity/libritts_r). It has been filtered based on two sources: 1. LibriTTS-R paper [1], which lists samples for which speech restoration have failed 2. LibriTTS-P [2] list of [excluded speakers](https://github.com/line/LibriTTS-P/blob/main/data/excluded_spk_list.txt) for which multiple speakers have been detected. LibriTTS-R [1] is a sound quality improved version of the [LibriTTS corpus](http://www.openslr.org/60/) which is a multi-speaker English corpus of approximately 585 hours of read English speech at 24kHz sampling rate, published in 2019. ## Usage ### Example Loading the `clean` config with only the `train.clean.360` split. ```py from datasets import load_dataset load_dataset("blabble-io/libritts_r", "clean", split="train.clean.100") ``` Streaming is also supported. ```py from datasets import load_dataset load_dataset("blabble-io/libritts_r", streaming=True) ``` ### Splits There are 7 splits (dots replace dashes from the original dataset, to comply with hf naming requirements): - dev.clean - dev.other - test.clean - test.other - train.clean.100 - train.clean.360 - train.other.500 ### Configurations There are 3 configurations, each which limits the splits the `load_dataset()` function will download. The default configuration is "all". - "dev": only the "dev.clean" split (good for testing the dataset quickly) - "clean": contains only "clean" splits - "other": contains only "other" splits - "all": contains only "all" splits ### Columns ``` { "audio": datasets.Audio(sampling_rate=24_000), "text_normalized": datasets.Value("string"), "text_original": datasets.Value("string"), "speaker_id": datasets.Value("string"), "path": datasets.Value("string"), "chapter_id": datasets.Value("string"), "id": datasets.Value("string"), } ``` ### Example Row ``` { 'audio': { 'path': '/home/user/.cache/huggingface/datasets/downloads/extracted/5551a515e85b9e463062524539c2e1cb52ba32affe128dffd866db0205248bdd/LibriTTS_R/dev-clean/3081/166546/3081_166546_000028_000002.wav', 'array': ..., 'sampling_rate': 24000 }, 'text_normalized': 'How quickly he disappeared!"', 'text_original': 'How quickly he disappeared!"', 'speaker_id': '3081', 'path': '/home/user/.cache/huggingface/datasets/downloads/extracted/5551a515e85b9e463062524539c2e1cb52ba32affe128dffd866db0205248bdd/LibriTTS_R/dev-clean/3081/166546/3081_166546_000028_000002.wav', 'chapter_id': '166546', 'id': '3081_166546_000028_000002' } ``` ## Dataset Details ### Dataset Description - **License:** CC BY 4.0 ### Dataset Sources [optional] <!-- Provide the basic links for the dataset. --> - **Homepage:** https://www.openslr.org/141/ - **Paper:** https://arxiv.org/abs/2305.18802 ## Citation <!-- If there is a paper or blog post introducing the dataset, the APA and Bibtex information for that should go in this section. --> ``` @ARTICLE{Koizumi2023-hs, title = "{LibriTTS-R}: A restored multi-speaker text-to-speech corpus", author = "Koizumi, Yuma and Zen, Heiga and Karita, Shigeki and Ding, Yifan and Yatabe, Kohei and Morioka, Nobuyuki and Bacchiani, Michiel and Zhang, Yu and Han, Wei and Bapna, Ankur", abstract = "This paper introduces a new speech dataset called ``LibriTTS-R'' designed for text-to-speech (TTS) use. It is derived by applying speech restoration to the LibriTTS corpus, which consists of 585 hours of speech data at 24 kHz sampling rate from 2,456 speakers and the corresponding texts. The constituent samples of LibriTTS-R are identical to those of LibriTTS, with only the sound quality improved. Experimental results show that the LibriTTS-R ground-truth samples showed significantly improved sound quality compared to those in LibriTTS. In addition, neural end-to-end TTS trained with LibriTTS-R achieved speech naturalness on par with that of the ground-truth samples. The corpus is freely available for download from \textbackslashurl\{http://www.openslr.org/141/\}.", month = may, year = 2023, copyright = "http://creativecommons.org/licenses/by-nc-nd/4.0/", archivePrefix = "arXiv", primaryClass = "eess.AS", eprint = "2305.18802" } ``` ``` @misc{kawamura2024librittspcorpusspeakingstyle, title={LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning}, author={Masaya Kawamura and Ryuichi Yamamoto and Yuma Shirahata and Takuya Hasumi and Kentaro Tachibana}, year={2024}, eprint={2406.07969}, archivePrefix={arXiv}, primaryClass={eess.AS}, url={https://arxiv.org/abs/2406.07969}, } ```
提供机构:
parler-tts
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作