Scicom-intl/Normalized-Multilingual-TTS
收藏Hugging Face2026-04-01 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/Scicom-intl/Normalized-Multilingual-TTS
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: 700h-tr-turkish-text-to-speech
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 823378
num_examples: 1089
download_size: 344319
dataset_size: 823378
- config_name: 9jalingo-hausa
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 23400821
num_examples: 77564
download_size: 5345302
dataset_size: 23400821
- config_name: 9jalingo-igbo
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 8028597
num_examples: 16958
download_size: 2583913
dataset_size: 8028597
- config_name: 9jalingo-pidgin
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 9681059
num_examples: 21148
download_size: 3322118
dataset_size: 9681059
- config_name: 9jalingo-yoruba
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 14292816
num_examples: 28621
download_size: 4912141
dataset_size: 14292816
- config_name: AISHELL3
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 13735586
num_examples: 50465
download_size: 5165319
dataset_size: 13735586
- config_name: Alexis
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 701623
num_examples: 1678
download_size: 313745
dataset_size: 701623
- config_name: AnimeVox
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 1759770
num_examples: 5767
download_size: 603796
dataset_size: 1759770
- config_name: ArVoice
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 7186340
num_examples: 8776
download_size: 567388
dataset_size: 7186340
- config_name: Arabic-Diacritized-TTS
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 1657258
num_examples: 2456
download_size: 478422
dataset_size: 1657258
- config_name: Azure-TTS-Synthetic
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 303768
num_examples: 1119
download_size: 45888
dataset_size: 303768
- config_name: Azure-TTS-annotated
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 90385882
num_examples: 161386
download_size: 33270770
dataset_size: 90385882
- config_name: Changsha_Dialect_Conversational_Speech_Corpus
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 123305
num_examples: 234
download_size: 28317
dataset_size: 123305
- config_name: ChildMandarin
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 8197820
num_examples: 31541
download_size: 2328258
dataset_size: 8197820
- config_name: ClArTTS
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 3400902
num_examples: 5871
download_size: 1248887
dataset_size: 3400902
- config_name: CommonVoice22_Sidon
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 670632202
num_examples: 1658539
download_size: 355598456
dataset_size: 670632202
- config_name: DarijaTTS-clean
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 1523288
num_examples: 3786
download_size: 493661
dataset_size: 1523288
- config_name: DisfluencySpeech
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 530728
num_examples: 1000
download_size: 225987
dataset_size: 530728
- config_name: EA-UD-DI
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 144482533
num_examples: 230703
download_size: 40588080
dataset_size: 144482533
- config_name: Elise
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 133184
num_examples: 338
download_size: 65289
dataset_size: 133184
- config_name: Emilia-NV
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 72785
num_examples: 277
download_size: 32750
dataset_size: 72785
- config_name: EmoVoice-DB
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 6615272
num_examples: 16230
download_size: 2463764
dataset_size: 6615272
- config_name: Enigma-Dataset
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 592414483
num_examples: 1240919
download_size: 190301543
dataset_size: 592414483
- config_name: FalAR
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 543341584
num_examples: 472580
download_size: 257785028
dataset_size: 543341584
- config_name: GCP-TTS
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 42026604
num_examples: 97846
download_size: 19167072
dataset_size: 42026604
- config_name: GTTS-Chirp3-Synthetic
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 9544859
num_examples: 12638
download_size: 4392073
dataset_size: 9544859
- config_name: GTTS-WaveNet-Synthetic
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 4546047
num_examples: 10027
download_size: 1587530
dataset_size: 4546047
- config_name: Habibi
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 861316
num_examples: 1774
download_size: 351593
dataset_size: 861316
- config_name: HiKE
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 320028
num_examples: 764
download_size: 160800
dataset_size: 320028
- config_name: Hindi-audio-speech
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 29304
num_examples: 23
download_size: 20819
dataset_size: 29304
- config_name: IMDA-TTS
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 1615926
num_examples: 4985
download_size: 802661
dataset_size: 1615926
- config_name: IMaSC
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 22071332
num_examples: 31052
download_size: 6116892
dataset_size: 22071332
- config_name: IndicTTS
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 43216999
num_examples: 36637
download_size: 14484565
dataset_size: 43216999
- config_name: IndicTTS_English
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 42090261
num_examples: 105158
download_size: 12401798
dataset_size: 42090261
- config_name: IndicTTS_v2
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 19268309
num_examples: 26178
download_size: 5985046
dataset_size: 19268309
- config_name: Iqra_TTS
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 13663237
num_examples: 31001
download_size: 1172633
dataset_size: 13663237
- config_name: JSS
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 853582
num_examples: 2023
download_size: 428031
dataset_size: 853582
- config_name: Japanese-Eroge-Voice
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 17836578
num_examples: 51841
download_size: 6053772
dataset_size: 17836578
- config_name: KSS
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 3003702
num_examples: 12542
download_size: 1353703
dataset_size: 3003702
- config_name: KeSpeech
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 4926242
num_examples: 15251
download_size: 1983706
dataset_size: 4926242
- config_name: Lahaja
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 702895
num_examples: 1584
download_size: 226670
dataset_size: 702895
- config_name: Latin-Audio
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 9038675
num_examples: 18250
download_size: 3738523
dataset_size: 9038675
- config_name: MasriSpeech-Full
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 2778164
num_examples: 5773
download_size: 997662
dataset_size: 2778164
- config_name: MsceneSpeech
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 139729
num_examples: 324
download_size: 59392
dataset_size: 139729
- config_name: Nanchang_Dialect_Conversational_Speech_Corpus
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 430455
num_examples: 780
download_size: 89693
dataset_size: 430455
- config_name: NonverbalTTS
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 795611
num_examples: 1653
download_size: 357832
dataset_size: 795611
- config_name: NorthTTS_audio
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 245250
num_examples: 624
download_size: 86591
dataset_size: 245250
- config_name: ORAA-MUPE-ASR
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 84794640
num_examples: 224711
download_size: 27394913
dataset_size: 84794640
- config_name: OutteTTS-urdu-dataset
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 4978213
num_examples: 9821
download_size: 1545219
dataset_size: 4978213
- config_name: ParlaSpeech-CZ
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 314743060
num_examples: 603141
download_size: 106901320
dataset_size: 314743060
- config_name: ParlaSpeech-HR
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 348677710
num_examples: 609009
download_size: 144460814
dataset_size: 348677710
- config_name: ParlaSpeech-PL
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 223015092
num_examples: 429538
download_size: 81700711
dataset_size: 223015092
- config_name: ParlaSpeech-RS
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 295664710
num_examples: 427537
download_size: 119474573
dataset_size: 295664710
- config_name: ParsiGoo
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 30693
num_examples: 56
download_size: 18369
dataset_size: 30693
- config_name: Persian-Farsi-Speech
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 3425164
num_examples: 4962
download_size: 1338786
dataset_size: 3425164
- config_name: Persian-Speech-Dataset
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 228300
num_examples: 487
download_size: 66887
dataset_size: 228300
- config_name: PersianVox_NM
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 2569540
num_examples: 5072
download_size: 931380
dataset_size: 2569540
- config_name: Quran-Recitations
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 16052572
num_examples: 26624
download_size: 1124490
dataset_size: 16052572
- config_name: Rasa
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 131554857
num_examples: 190547
download_size: 49694515
dataset_size: 131554857
- config_name: SADA22
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 13307557
num_examples: 48700
download_size: 3939741
dataset_size: 13307557
- config_name: Shanghai_Dialect_Conversational_Speech_Corpus
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 842383
num_examples: 1524
download_size: 168774
dataset_size: 842383
- config_name: Speech-MASSIVE
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 13536113
num_examples: 37396
download_size: 3787502
dataset_size: 13536113
- config_name: StoryTTS
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 5712828
num_examples: 17096
download_size: 1532435
dataset_size: 5712828
- config_name: TTS-Romanian
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 118790490
num_examples: 172176
download_size: 51076697
dataset_size: 118790490
- config_name: Taiwanese-Minnan-Sutiau
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 4623525
num_examples: 15925
download_size: 786727
dataset_size: 4623525
- config_name: Tianjin_Dialect_Conversational_Speech_Corpus
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 1527800
num_examples: 3033
download_size: 252016
dataset_size: 1527800
- config_name: Tibetan-0310
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 3000
num_examples: 12
download_size: 5098
dataset_size: 3000
- config_name: ToneWebinars
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 363630563
num_examples: 194733
download_size: 165936552
dataset_size: 363630563
- config_name: UAT
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 3927494
num_examples: 10515
download_size: 1421769
dataset_size: 3927494
- config_name: VLSP2020
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 27944988
num_examples: 48293
download_size: 11831715
dataset_size: 27944988
- config_name: Vaani
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 47614815
num_examples: 73492
download_size: 14286651
dataset_size: 47614815
- config_name: VieNeu-TTS-140h
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 46011756
num_examples: 71204
download_size: 18861309
dataset_size: 46011756
- config_name: VietMed_labeled
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 4640112
num_examples: 7371
download_size: 1548465
dataset_size: 4640112
- config_name: WaxalNLP
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 153421326
num_examples: 194119
download_size: 67342038
dataset_size: 153421326
- config_name: WenetSpeech4TTS_Premium
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 32359853
num_examples: 64709
download_size: 11582416
dataset_size: 32359853
- config_name: Wikimedia-Speech-Irish
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 5873949
num_examples: 10080
download_size: 1437175
dataset_size: 5873949
- config_name: YouTube-Cantonese
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 63231603
num_examples: 56213
download_size: 38637984
dataset_size: 63231603
- config_name: Zhengzhou_Dialect_Conversational_Speech_Corpus
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 549659
num_examples: 1039
download_size: 99414
dataset_size: 549659
- config_name: africanvoices_hau
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 40470563
num_examples: 122540
download_size: 8303004
dataset_size: 40470563
- config_name: afvoices
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 46154100
num_examples: 155222
download_size: 13214909
dataset_size: 46154100
- config_name: amharic-speech-dataset
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 21800838
num_examples: 25654
download_size: 7818043
dataset_size: 21800838
- config_name: andrew-v3
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 695749
num_examples: 1346
download_size: 226248
dataset_size: 695749
- config_name: anta_women_tts
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 5668332
num_examples: 18853
download_size: 1788557
dataset_size: 5668332
- config_name: anv_data_ke
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 119012828
num_examples: 272638
download_size: 37695435
dataset_size: 119012828
- config_name: ar-quran-hadith14books-MSA
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 5576740
num_examples: 8956
download_size: 1473137
dataset_size: 5576740
- config_name: arabic-egy-cleaned
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 15877228
num_examples: 37648
download_size: 4953054
dataset_size: 15877228
- config_name: arknights_voices
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 8282743
num_examples: 21895
download_size: 2719254
dataset_size: 8282743
- config_name: biggest-ru-book
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 259514670
num_examples: 482481
download_size: 120779278
dataset_size: 259514670
- config_name: cantonese_daily
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 900595
num_examples: 2614
download_size: 265130
dataset_size: 900595
- config_name: cml-tts
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 463633872
num_examples: 836398
download_size: 216501941
dataset_size: 463633872
- config_name: common-voice-22
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 1882754083
num_examples: 5224651
download_size: 818808092
dataset_size: 1882754083
- config_name: coral-v2
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 104754967
num_examples: 238693
download_size: 41361412
dataset_size: 104754967
- config_name: coral-v3
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 130502153
num_examples: 280941
download_size: 48587197
dataset_size: 130502153
- config_name: david-dataset
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 373952
num_examples: 1049
download_size: 134959
dataset_size: 373952
- config_name: dolly-audio-1000h-vietnamese
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 423402394
num_examples: 616453
download_size: 134899714
dataset_size: 423402394
- config_name: echo
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 22786957
num_examples: 45336
download_size: 4071603
dataset_size: 22786957
- config_name: elevenlabs_ru
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 3366444
num_examples: 4120
download_size: 1269264
dataset_size: 3366444
- config_name: emilia_zh
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 412318082
num_examples: 1222888
download_size: 213078279
dataset_size: 412318082
- config_name: everyayah
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 69508956
num_examples: 113129
download_size: 3403538
dataset_size: 69508956
- config_name: everyayah-phonemes
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 49664010
num_examples: 73172
download_size: 2420482
dataset_size: 49664010
- config_name: expresso
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 1990624
num_examples: 6767
download_size: 261458
dataset_size: 1990624
- config_name: ftspeech
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 439005540
num_examples: 871006
download_size: 161363603
dataset_size: 439005540
- config_name: gemini-flash-2.0-speech
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 33348119
num_examples: 48971
download_size: 8612445
dataset_size: 33348119
- config_name: genshin-voice
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 72240002
num_examples: 175935
download_size: 31036505
dataset_size: 72240002
- config_name: ghana-english-asr-2700hrs
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 102918624
num_examples: 136314
download_size: 41204950
dataset_size: 102918624
- config_name: google-argentinian-spanish
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 1857014
num_examples: 3690
download_size: 252403
dataset_size: 1857014
- config_name: google-colombian-spanish
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 2987600
num_examples: 6287
download_size: 271712
dataset_size: 2987600
- config_name: google_audio
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 9888735
num_examples: 15582
download_size: 2443846
dataset_size: 9888735
- config_name: greek-tts-dataset
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 1094500
num_examples: 1255
download_size: 445425
dataset_size: 1094500
- config_name: haqkiem-TTS
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 1756086
num_examples: 3915
download_size: 785479
dataset_size: 1756086
- config_name: hebrew_speech_campus
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 39045110
num_examples: 59083
download_size: 14129478
dataset_size: 39045110
- config_name: hebrew_speech_coursera
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 12662112
num_examples: 22481
download_size: 3822133
dataset_size: 12662112
- config_name: hebrew_speech_kan
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 4211171
num_examples: 8930
download_size: 1274225
dataset_size: 4211171
- config_name: highquality_nepali_female_asr
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 202403
num_examples: 353
download_size: 52612
dataset_size: 202403
- config_name: hindi_ai4bharat_indictts
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 741853
num_examples: 864
download_size: 199228
dataset_size: 741853
- config_name: hinglish-compressed
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 8724005
num_examples: 16914
download_size: 2089011
dataset_size: 8724005
- config_name: hq-conversations
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 2872328
num_examples: 8615
download_size: 845836
dataset_size: 2872328
- config_name: hungarian-single-speaker-tts
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 2439262
num_examples: 3568
download_size: 973405
dataset_size: 2439262
- config_name: idrak_ryanspeech
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 2513557
num_examples: 5912
download_size: 839128
dataset_size: 2513557
- config_name: indian-english-nptel-v0
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 189092491
num_examples: 344910
download_size: 62749530
dataset_size: 189092491
- config_name: indian_accent_english
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 2197171
num_examples: 4797
download_size: 736994
dataset_size: 2197171
- config_name: indic-Malayalam-PD
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 36378988
num_examples: 25461
download_size: 10988598
dataset_size: 36378988
- config_name: indic_hi_en_tts
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 7510045
num_examples: 14522
download_size: 2403159
dataset_size: 7510045
- config_name: indicvoices_r
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 263812600
num_examples: 253250
download_size: 83335274
dataset_size: 263812600
- config_name: indonesian-audiobook-tts
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 1112167
num_examples: 1531
download_size: 394561
dataset_size: 1112167
- config_name: japanese-Eroge-Voice-V2
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 64415517
num_examples: 129886
download_size: 18145783
dataset_size: 64415517
- config_name: japanese-anime-speech
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 11423140
num_examples: 29577
download_size: 2835984
dataset_size: 11423140
- config_name: japanese-anime-speech-v2
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 43942529
num_examples: 110371
download_size: 9278883
dataset_size: 43942529
- config_name: jenny_tts_dataset
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 5594305
num_examples: 12225
download_size: 2101564
dataset_size: 5594305
- config_name: kazakh_songs_asr
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 1296209
num_examples: 2925
download_size: 342419
dataset_size: 1296209
- config_name: kazlibri
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 1951602
num_examples: 3996
download_size: 761013
dataset_size: 1951602
- config_name: khursanirevo_chatter
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 12261718
num_examples: 28636
download_size: 4532583
dataset_size: 12261718
- config_name: kinyarwanda-tts-dataset
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 908060
num_examples: 1889
download_size: 353968
dataset_size: 908060
- config_name: korean-audio-text-develop
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 2042136
num_examples: 4576
download_size: 567405
dataset_size: 2042136
- config_name: leyu-amharic-gojjam-dialect
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 861489
num_examples: 861
download_size: 173436
dataset_size: 861489
- config_name: libritts_r_filtered
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 129493489
num_examples: 241645
download_size: 47484064
dataset_size: 129493489
- config_name: linto-dataset-audio-ar-tn
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 4228686
num_examples: 9376
download_size: 811675
dataset_size: 4228686
- config_name: lithuanian-speech-dataset
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 2367196
num_examples: 3581
download_size: 562913
dataset_size: 2367196
- config_name: magicdata
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 90711432
num_examples: 371376
download_size: 23223172
dataset_size: 90711432
- config_name: malay-audiobook
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 12657500
num_examples: 22170
download_size: 4075223
dataset_size: 12657500
- config_name: malaysian-emilia-v2
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 985576577
num_examples: 994022
download_size: 288187788
dataset_size: 985576577
- config_name: maya-audio
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 1321358
num_examples: 4701
download_size: 427078
dataset_size: 1321358
- config_name: mixed_cantonese_and_english_speech
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 1008296
num_examples: 1730
download_size: 368437
dataset_size: 1008296
- config_name: multilingual-tts
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 4276727
num_examples: 6710
download_size: 404049
dataset_size: 4276727
- config_name: naijavoices-dataset
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 666320802
num_examples: 1476572
download_size: 182263152
dataset_size: 666320802
- config_name: nchlt_speech_zul
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 13432412
num_examples: 43230
download_size: 1648507
dataset_size: 13432412
- config_name: ngochuyen_voice
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 5215508
num_examples: 5432
download_size: 2007399
dataset_size: 5215508
- config_name: npsc_nb
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 13671476
num_examples: 29249
download_size: 4827008
dataset_size: 13671476
- config_name: nst-da
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 75221805
num_examples: 204402
download_size: 24811011
dataset_size: 75221805
- config_name: omnilingual-asr-corpus
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 9886554
num_examples: 14030
download_size: 3914738
dataset_size: 9886554
- config_name: opendata-iisys-hui
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 45801768
num_examples: 71619
download_size: 19336314
dataset_size: 45801768
- config_name: opentts-lada
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 2261963
num_examples: 4958
download_size: 756781
dataset_size: 2261963
- config_name: or_in_dataset
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 9386942
num_examples: 15618
download_size: 469383
dataset_size: 9386942
- config_name: pangloss
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 11226802
num_examples: 31066
download_size: 3603834
dataset_size: 11226802
- config_name: phoaudiobook
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 517395585
num_examples: 1008937
download_size: 183921482
dataset_size: 517395585
- config_name: punjabi-asr
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 2156553
num_examples: 2079
download_size: 742742
dataset_size: 2156553
- config_name: quran-md-ayahs
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 92869988
num_examples: 125411
download_size: 4149966
dataset_size: 92869988
- config_name: raddromur_asr
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 8212884
num_examples: 10525
download_size: 4121660
dataset_size: 8212884
- config_name: ru_book_dataset
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 12103756
num_examples: 19507
download_size: 4484510
dataset_size: 12103756
- config_name: salmon-asr-smj
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 7147068
num_examples: 17502
download_size: 2212257
dataset_size: 7147068
- config_name: samromur_children
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 31752065
num_examples: 82256
download_size: 7161260
dataset_size: 31752065
- config_name: seniortalk
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 9128964
num_examples: 31152
download_size: 2541120
dataset_size: 9128964
- config_name: shrutilipi_sanskrit
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 2112216
num_examples: 2040
download_size: 611305
dataset_size: 2112216
- config_name: singaporean_accent_district_names_continuation
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 1013032
num_examples: 1524
download_size: 228338
dataset_size: 1013032
- config_name: singlish-speaker
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 580215
num_examples: 1585
download_size: 186298
dataset_size: 580215
- config_name: somali-tts-datasets
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 598083
num_examples: 1433
download_size: 162047
dataset_size: 598083
- config_name: southern-kurdish-asr
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 1602385
num_examples: 3226
download_size: 415150
dataset_size: 1602385
- config_name: sova_rudevices
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 113143179
num_examples: 183867
download_size: 35592494
dataset_size: 113143179
- config_name: stuttering_asr
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 799989
num_examples: 1957
download_size: 170060
dataset_size: 799989
- config_name: sudanese_dialect_speech
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 408264
num_examples: 922
download_size: 95458
dataset_size: 408264
- config_name: tachelhiyt-darija
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 179516
num_examples: 489
download_size: 36854
dataset_size: 179516
- config_name: tibetan_voice
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 568980
num_examples: 1615
download_size: 165915
dataset_size: 568980
- config_name: tibetan_wz_tts
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 3539
num_examples: 14
download_size: 5058
dataset_size: 3539
- config_name: tts-indo
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 33240532
num_examples: 110671
download_size: 3657581
dataset_size: 33240532
- config_name: ucla_dataset
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 572465166
num_examples: 530879
download_size: 209092154
dataset_size: 572465166
- config_name: urdu-tts-speaker3
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 291621
num_examples: 303
download_size: 120928
dataset_size: 291621
- config_name: urdu_asr_data
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 15228018
num_examples: 25392
download_size: 5783600
dataset_size: 15228018
- config_name: viVoice
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 360576836
num_examples: 780045
download_size: 144677152
dataset_size: 360576836
- config_name: vibravox_16k
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 4483350
num_examples: 11149
download_size: 1715315
dataset_size: 4483350
- config_name: vivos-vie-speech2text
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 6133323
num_examples: 11843
download_size: 1905222
dataset_size: 6133323
- config_name: vlsp-vie-speech2text1
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 32583369
num_examples: 47254
download_size: 12328939
dataset_size: 32583369
- config_name: vlsp2020_vinai_100h
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 31108628
num_examples: 48272
download_size: 11985661
dataset_size: 31108628
- config_name: voice-of-america
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 130848687
num_examples: 234394
download_size: 42769583
dataset_size: 130848687
- config_name: voices_jp
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 4631028
num_examples: 12192
download_size: 1332355
dataset_size: 4631028
- config_name: voxbox
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 57256676
num_examples: 201118
download_size: 18990651
dataset_size: 57256676
- config_name: waxal-tts
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 1584630
num_examples: 3069
download_size: 643981
dataset_size: 1584630
- config_name: yue_emo_speech
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 84258403
num_examples: 245376
download_size: 30170998
dataset_size: 84258403
- config_name: zeroth_korean
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 10518947
num_examples: 17111
download_size: 1114503
dataset_size: 10518947
- config_name: zeroth_korean_ipa
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 11014125
num_examples: 16867
download_size: 1118051
dataset_size: 11014125
- config_name: zh-taiwan
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 331431
num_examples: 1055
download_size: 133486
dataset_size: 331431
- config_name: zh-yue-tts-dataset
features:
- name: text
dtype: string
- name: speaker
dtype: string
- name: processed_text
dtype: string
- name: token_filename
dtype: string
- name: postprocessed_text
dtype: string
- name: audio_filename
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 5396774
num_examples: 12803
download_size: 1662868
dataset_size: 5396774
configs:
- config_name: 700h-tr-turkish-text-to-speech
data_files:
- split: train
path: 700h-tr-turkish-text-to-speech/train-*
- config_name: 9jalingo-hausa
data_files:
- split: train
path: 9jalingo-hausa/train-*
- config_name: 9jalingo-igbo
data_files:
- split: train
path: 9jalingo-igbo/train-*
- config_name: 9jalingo-pidgin
data_files:
- split: train
path: 9jalingo-pidgin/train-*
- config_name: 9jalingo-yoruba
data_files:
- split: train
path: 9jalingo-yoruba/train-*
- config_name: AISHELL3
data_files:
- split: train
path: AISHELL3/train-*
- config_name: Alexis
data_files:
- split: train
path: Alexis/train-*
- config_name: AnimeVox
data_files:
- split: train
path: AnimeVox/train-*
- config_name: ArVoice
data_files:
- split: train
path: ArVoice/train-*
- config_name: Arabic-Diacritized-TTS
data_files:
- split: train
path: Arabic-Diacritized-TTS/train-*
- config_name: Azure-TTS-Synthetic
data_files:
- split: train
path: Azure-TTS-Synthetic/train-*
- config_name: Azure-TTS-annotated
data_files:
- split: train
path: Azure-TTS-annotated/train-*
- config_name: Changsha_Dialect_Conversational_Speech_Corpus
data_files:
- split: train
path: Changsha_Dialect_Conversational_Speech_Corpus/train-*
- config_name: ChildMandarin
data_files:
- split: train
path: ChildMandarin/train-*
- config_name: ClArTTS
data_files:
- split: train
path: ClArTTS/train-*
- config_name: CommonVoice22_Sidon
data_files:
- split: train
path: CommonVoice22_Sidon/train-*
- config_name: DarijaTTS-clean
data_files:
- split: train
path: DarijaTTS-clean/train-*
- config_name: DisfluencySpeech
data_files:
- split: train
path: DisfluencySpeech/train-*
- config_name: EA-UD-DI
data_files:
- split: train
path: EA-UD-DI/train-*
- config_name: Elise
data_files:
- split: train
path: Elise/train-*
- config_name: Emilia-NV
data_files:
- split: train
path: Emilia-NV/train-*
- config_name: EmoVoice-DB
data_files:
- split: train
path: EmoVoice-DB/train-*
- config_name: Enigma-Dataset
data_files:
- split: train
path: Enigma-Dataset/train-*
- config_name: FalAR
data_files:
- split: train
path: FalAR/train-*
- config_name: GCP-TTS
data_files:
- split: train
path: GCP-TTS/train-*
- config_name: GTTS-Chirp3-Synthetic
data_files:
- split: train
path: GTTS-Chirp3-Synthetic/train-*
- config_name: GTTS-WaveNet-Synthetic
data_files:
- split: train
path: GTTS-WaveNet-Synthetic/train-*
- config_name: Habibi
data_files:
- split: train
path: Habibi/train-*
- config_name: HiKE
data_files:
- split: train
path: HiKE/train-*
- config_name: Hindi-audio-speech
data_files:
- split: train
path: Hindi-audio-speech/train-*
- config_name: IMDA-TTS
data_files:
- split: train
path: IMDA-TTS/train-*
- config_name: IMaSC
data_files:
- split: train
path: IMaSC/train-*
- config_name: IndicTTS
data_files:
- split: train
path: IndicTTS/train-*
- config_name: IndicTTS_English
data_files:
- split: train
path: IndicTTS_English/train-*
- config_name: IndicTTS_v2
data_files:
- split: train
path: IndicTTS_v2/train-*
- config_name: Iqra_TTS
data_files:
- split: train
path: Iqra_TTS/train-*
- config_name: JSS
data_files:
- split: train
path: JSS/train-*
- config_name: Japanese-Eroge-Voice
data_files:
- split: train
path: Japanese-Eroge-Voice/train-*
- config_name: KSS
data_files:
- split: train
path: KSS/train-*
- config_name: KeSpeech
data_files:
- split: train
path: KeSpeech/train-*
- config_name: Lahaja
data_files:
- split: train
path: Lahaja/train-*
- config_name: Latin-Audio
data_files:
- split: train
path: Latin-Audio/train-*
- config_name: MasriSpeech-Full
data_files:
- split: train
path: MasriSpeech-Full/train-*
- config_name: MsceneSpeech
data_files:
- split: train
path: MsceneSpeech/train-*
- config_name: Nanchang_Dialect_Conversational_Speech_Corpus
data_files:
- split: train
path: Nanchang_Dialect_Conversational_Speech_Corpus/train-*
- config_name: NonverbalTTS
data_files:
- split: train
path: NonverbalTTS/train-*
- config_name: NorthTTS_audio
data_files:
- split: train
path: NorthTTS_audio/train-*
- config_name: ORAA-MUPE-ASR
data_files:
- split: train
path: ORAA-MUPE-ASR/train-*
- config_name: OutteTTS-urdu-dataset
data_files:
- split: train
path: OutteTTS-urdu-dataset/train-*
- config_name: ParlaSpeech-CZ
data_files:
- split: train
path: ParlaSpeech-CZ/train-*
- config_name: ParlaSpeech-HR
data_files:
- split: train
path: ParlaSpeech-HR/train-*
- config_name: ParlaSpeech-PL
data_files:
- split: train
path: ParlaSpeech-PL/train-*
- config_name: ParlaSpeech-RS
data_files:
- split: train
path: ParlaSpeech-RS/train-*
- config_name: ParsiGoo
data_files:
- split: train
path: ParsiGoo/train-*
- config_name: Persian-Farsi-Speech
data_files:
- split: train
path: Persian-Farsi-Speech/train-*
- config_name: Persian-Speech-Dataset
data_files:
- split: train
path: Persian-Speech-Dataset/train-*
- config_name: PersianVox_NM
data_files:
- split: train
path: PersianVox_NM/train-*
- config_name: Quran-Recitations
data_files:
- split: train
path: Quran-Recitations/train-*
- config_name: Rasa
data_files:
- split: train
path: Rasa/train-*
- config_name: SADA22
data_files:
- split: train
path: SADA22/train-*
- config_name: Shanghai_Dialect_Conversational_Speech_Corpus
data_files:
- split: train
path: Shanghai_Dialect_Conversational_Speech_Corpus/train-*
- config_name: Speech-MASSIVE
data_files:
- split: train
path: Speech-MASSIVE/train-*
- config_name: StoryTTS
data_files:
- split: train
path: StoryTTS/train-*
- config_name: TTS-Romanian
data_files:
- split: train
path: TTS-Romanian/train-*
- config_name: Taiwanese-Minnan-Sutiau
data_files:
- split: train
path: Taiwanese-Minnan-Sutiau/train-*
- config_name: Tianjin_Dialect_Conversational_Speech_Corpus
data_files:
- split: train
path: Tianjin_Dialect_Conversational_Speech_Corpus/train-*
- config_name: Tibetan-0310
data_files:
- split: train
path: Tibetan-0310/train-*
- config_name: ToneWebinars
data_files:
- split: train
path: ToneWebinars/train-*
- config_name: UAT
data_files:
- split: train
path: UAT/train-*
- config_name: VLSP2020
data_files:
- split: train
path: VLSP2020/train-*
- config_name: Vaani
data_files:
- split: train
path: Vaani/train-*
- config_name: VieNeu-TTS-140h
data_files:
- split: train
path: VieNeu-TTS-140h/train-*
- config_name: VietMed_labeled
data_files:
- split: train
path: VietMed_labeled/train-*
- config_name: WaxalNLP
data_files:
- split: train
path: WaxalNLP/train-*
- config_name: WenetSpeech4TTS_Premium
data_files:
- split: train
path: WenetSpeech4TTS_Premium/train-*
- config_name: Wikimedia-Speech-Irish
data_files:
- split: train
path: Wikimedia-Speech-Irish/train-*
- config_name: YouTube-Cantonese
data_files:
- split: train
path: YouTube-Cantonese/train-*
- config_name: Zhengzhou_Dialect_Conversational_Speech_Corpus
data_files:
- split: train
path: Zhengzhou_Dialect_Conversational_Speech_Corpus/train-*
- config_name: africanvoices_hau
data_files:
- split: train
path: africanvoices_hau/train-*
- config_name: afvoices
data_files:
- split: train
path: afvoices/train-*
- config_name: amharic-speech-dataset
data_files:
- split: train
path: amharic-speech-dataset/train-*
- config_name: andrew-v3
data_files:
- split: train
path: andrew-v3/train-*
- config_name: anta_women_tts
data_files:
- split: train
path: anta_women_tts/train-*
- config_name: anv_data_ke
data_files:
- split: train
path: anv_data_ke/train-*
- config_name: ar-quran-hadith14books-MSA
data_files:
- split: train
path: ar-quran-hadith14books-MSA/train-*
- config_name: arabic-egy-cleaned
data_files:
- split: train
path: arabic-egy-cleaned/train-*
- config_name: arknights_voices
data_files:
- split: train
path: arknights_voices/train-*
- config_name: biggest-ru-book
data_files:
- split: train
path: biggest-ru-book/train-*
- config_name: cantonese_daily
data_files:
- split: train
path: cantonese_daily/train-*
- config_name: cml-tts
data_files:
- split: train
path: cml-tts/train-*
- config_name: common-voice-22
data_files:
- split: train
path: common-voice-22/train-*
- config_name: coral-v2
data_files:
- split: train
path: coral-v2/train-*
- config_name: coral-v3
data_files:
- split: train
path: coral-v3/train-*
- config_name: david-dataset
data_files:
- split: train
path: david-dataset/train-*
- config_name: dolly-audio-1000h-vietnamese
data_files:
- split: train
path: dolly-audio-1000h-vietnamese/train-*
- config_name: echo
data_files:
- split: train
path: echo/train-*
- config_name: elevenlabs_ru
data_files:
- split: train
path: elevenlabs_ru/train-*
- config_name: emilia_zh
data_files:
- split: train
path: emilia_zh/train-*
- config_name: everyayah
data_files:
- split: train
path: everyayah/train-*
- config_name: everyayah-phonemes
data_files:
- split: train
path: everyayah-phonemes/train-*
- config_name: expresso
data_files:
- split: train
path: expresso/train-*
- config_name: ftspeech
data_files:
- split: train
path: ftspeech/train-*
- config_name: gemini-flash-2.0-speech
data_files:
- split: train
path: gemini-flash-2.0-speech/train-*
- config_name: genshin-voice
data_files:
- split: train
path: genshin-voice/train-*
- config_name: ghana-english-asr-2700hrs
data_files:
- split: train
path: ghana-english-asr-2700hrs/train-*
- config_name: google-argentinian-spanish
data_files:
- split: train
path: google-argentinian-spanish/train-*
- config_name: google-colombian-spanish
data_files:
- split: train
path: google-colombian-spanish/train-*
- config_name: google_audio
data_files:
- split: train
path: google_audio/train-*
- config_name: greek-tts-dataset
data_files:
- split: train
path: greek-tts-dataset/train-*
- config_name: haqkiem-TTS
data_files:
- split: train
path: haqkiem-TTS/train-*
- config_name: hebrew_speech_campus
data_files:
- split: train
path: hebrew_speech_campus/train-*
- config_name: hebrew_speech_coursera
data_files:
- split: train
path: hebrew_speech_coursera/train-*
- config_name: hebrew_speech_kan
data_files:
- split: train
path: hebrew_speech_kan/train-*
- config_name: highquality_nepali_female_asr
data_files:
- split: train
path: highquality_nepali_female_asr/train-*
- config_name: hindi_ai4bharat_indictts
data_files:
- split: train
path: hindi_ai4bharat_indictts/train-*
- config_name: hinglish-compressed
data_files:
- split: train
path: hinglish-compressed/train-*
- config_name: hq-conversations
data_files:
- split: train
path: hq-conversations/train-*
- config_name: hungarian-single-speaker-tts
data_files:
- split: train
path: hungarian-single-speaker-tts/train-*
- config_name: idrak_ryanspeech
data_files:
- split: train
path: idrak_ryanspeech/train-*
- config_name: indian-english-nptel-v0
data_files:
- split: train
path: indian-english-nptel-v0/train-*
- config_name: indian_accent_english
data_files:
- split: train
path: indian_accent_english/train-*
- config_name: indic-Malayalam-PD
data_files:
- split: train
path: indic-Malayalam-PD/train-*
- config_name: indic_hi_en_tts
data_files:
- split: train
path: indic_hi_en_tts/train-*
- config_name: indicvoices_r
data_files:
- split: train
path: indicvoices_r/train-*
- config_name: indonesian-audiobook-tts
data_files:
- split: train
path: indonesian-audiobook-tts/train-*
- config_name: japanese-Eroge-Voice-V2
data_files:
- split: train
path: japanese-Eroge-Voice-V2/train-*
- config_name: japanese-anime-speech
data_files:
- split: train
path: japanese-anime-speech/train-*
- config_name: japanese-anime-speech-v2
data_files:
- split: train
path: japanese-anime-speech-v2/train-*
- config_name: jenny_tts_dataset
data_files:
- split: train
path: jenny_tts_dataset/train-*
- config_name: kazakh_songs_asr
data_files:
- split: train
path: kazakh_songs_asr/train-*
- config_name: kazlibri
data_files:
- split: train
path: kazlibri/train-*
- config_name: khursanirevo_chatter
data_files:
- split: train
path: khursanirevo_chatter/train-*
- config_name: kinyarwanda-tts-dataset
data_files:
- split: train
path: kinyarwanda-tts-dataset/train-*
- config_name: korean-audio-text-develop
data_files:
- split: train
path: korean-audio-text-develop/train-*
- config_name: leyu-amharic-gojjam-dialect
data_files:
- split: train
path: leyu-amharic-gojjam-dialect/train-*
- config_name: libritts_r_filtered
data_files:
- split: train
path: libritts_r_filtered/train-*
- config_name: linto-dataset-audio-ar-tn
data_files:
- split: train
path: linto-dataset-audio-ar-tn/train-*
- config_name: lithuanian-speech-dataset
data_files:
- split: train
path: lithuanian-speech-dataset/train-*
- config_name: magicdata
data_files:
- split: train
path: magicdata/train-*
- config_name: malay-audiobook
data_files:
- split: train
path: malay-audiobook/train-*
- config_name: malaysian-emilia-v2
data_files:
- split: train
path: malaysian-emilia-v2/train-*
- config_name: maya-audio
data_files:
- split: train
path: maya-audio/train-*
- config_name: mixed_cantonese_and_english_speech
data_files:
- split: train
path: mixed_cantonese_and_english_speech/train-*
- config_name: multilingual-tts
data_files:
- split: train
path: multilingual-tts/train-*
- config_name: naijavoices-dataset
data_files:
- split: train
path: naijavoices-dataset/train-*
- config_name: nchlt_speech_zul
data_files:
- split: train
path: nchlt_speech_zul/train-*
- config_name: ngochuyen_voice
data_files:
- split: train
path: ngochuyen_voice/train-*
- config_name: npsc_nb
data_files:
- split: train
path: npsc_nb/train-*
- config_name: nst-da
data_files:
- split: train
path: nst-da/train-*
- config_name: omnilingual-asr-corpus
data_files:
- split: train
path: omnilingual-asr-corpus/train-*
- config_name: opendata-iisys-hui
data_files:
- split: train
path: opendata-iisys-hui/train-*
- config_name: opentts-lada
data_files:
- split: train
path: opentts-lada/train-*
- config_name: or_in_dataset
data_files:
- split: train
path: or_in_dataset/train-*
- config_name: pangloss
data_files:
- split: train
path: pangloss/train-*
- config_name: phoaudiobook
data_files:
- split: train
path: phoaudiobook/train-*
- config_name: punjabi-asr
data_files:
- split: train
path: punjabi-asr/train-*
- config_name: quran-md-ayahs
data_files:
- split: train
path: quran-md-ayahs/train-*
- config_name: raddromur_asr
data_files:
- split: train
path: raddromur_asr/train-*
- config_name: ru_book_dataset
data_files:
- split: train
path: ru_book_dataset/train-*
- config_name: salmon-asr-smj
data_files:
- split: train
path: salmon-asr-smj/train-*
- config_name: samromur_children
data_files:
- split: train
path: samromur_children/train-*
- config_name: seniortalk
data_files:
- split: train
path: seniortalk/train-*
- config_name: shrutilipi_sanskrit
data_files:
- split: train
path: shrutilipi_sanskrit/train-*
- config_name: singaporean_accent_district_names_continuation
data_files:
- split: train
path: singaporean_accent_district_names_continuation/train-*
- config_name: singlish-speaker
data_files:
- split: train
path: singlish-speaker/train-*
- config_name: somali-tts-datasets
data_files:
- split: train
path: somali-tts-datasets/train-*
- config_name: southern-kurdish-asr
data_files:
- split: train
path: southern-kurdish-asr/train-*
- config_name: sova_rudevices
data_files:
- split: train
path: sova_rudevices/train-*
- config_name: stuttering_asr
data_files:
- split: train
path: stuttering_asr/train-*
- config_name: sudanese_dialect_speech
data_files:
- split: train
path: sudanese_dialect_speech/train-*
- config_name: tachelhiyt-darija
data_files:
- split: train
path: tachelhiyt-darija/train-*
- config_name: tibetan_voice
data_files:
- split: train
path: tibetan_voice/train-*
- config_name: tibetan_wz_tts
data_files:
- split: train
path: tibetan_wz_tts/train-*
- config_name: tts-indo
data_files:
- split: train
path: tts-indo/train-*
- config_name: ucla_dataset
data_files:
- split: train
path: ucla_dataset/train-*
- config_name: urdu-tts-speaker3
data_files:
- split: train
path: urdu-tts-speaker3/train-*
- config_name: urdu_asr_data
data_files:
- split: train
path: urdu_asr_data/train-*
- config_name: viVoice
data_files:
- split: train
path: viVoice/train-*
- config_name: vibravox_16k
data_files:
- split: train
path: vibravox_16k/train-*
- config_name: vivos-vie-speech2text
data_files:
- split: train
path: vivos-vie-speech2text/train-*
- config_name: vlsp-vie-speech2text1
data_files:
- split: train
path: vlsp-vie-speech2text1/train-*
- config_name: vlsp2020_vinai_100h
data_files:
- split: train
path: vlsp2020_vinai_100h/train-*
- config_name: voice-of-america
data_files:
- split: train
path: voice-of-america/train-*
- config_name: voices_jp
data_files:
- split: train
path: voices_jp/train-*
- config_name: voxbox
data_files:
- split: train
path: voxbox/train-*
- config_name: waxal-tts
data_files:
- split: train
path: waxal-tts/train-*
- config_name: yue_emo_speech
data_files:
- split: train
path: yue_emo_speech/train-*
- config_name: zeroth_korean
data_files:
- split: train
path: zeroth_korean/train-*
- config_name: zeroth_korean_ipa
data_files:
- split: train
path: zeroth_korean_ipa/train-*
- config_name: zh-taiwan
data_files:
- split: train
path: zh-taiwan/train-*
- config_name: zh-yue-tts-dataset
data_files:
- split: train
path: zh-yue-tts-dataset/train-*
---
# Normalized Multilingual TTS
Original dataset from [malaysia-ai/Multilingual-TTS](https://huggingface.co/datasets/malaysia-ai/Multilingual-TTS), we applied postfilter and postprocessing using [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct).
## Acknowledgement
Special thanks to https://www.scitix.ai/ for H100 Node!
提供机构:
Scicom-intl
搜集汇总
数据集介绍

构建方式
在语音合成领域,多语言数据集的构建往往面临标注标准不一的挑战。Normalized-Multilingual-TTS数据集通过整合众多公开语音语料库,如AISHELL3、CommonVoice22_Sidon、IndicTTS等,采用统一的预处理流程,将原始文本转化为标准化的processed_text与postprocessed_text形式,并关联对应的音频文件,从而构建了一个结构一致的大规模多语言语音数据集。
使用方法
研究人员可利用该数据集进行多语言文本到语音模型的开发与评估。通过HuggingFace数据集库加载特定语言配置,即可访问标准化的文本-音频对。其统一的特征结构便于进行批量数据加载与处理,适用于训练端到端的神经语音合成系统,或用于进行语音合成质量、说话人适应性以及跨语言语音生成等前沿课题的探索。
背景与挑战
背景概述
在语音合成技术迈向多语言、高自然度发展的背景下,Normalized-Multilingual-TTS数据集应运而生,旨在解决跨语言语音生成中的标准化与统一性问题。该数据集由多个研究机构与社区共同构建,整合了涵盖土耳其语、豪萨语、约鲁巴语、中文、阿拉伯语等数十种语言的语音文本对,核心研究聚焦于为多语言TTS模型提供经过规范化处理的训练资源。通过引入processed_text、postprocessed_text等特征,该数据集推动了语音合成前端文本处理技术的进步,对低资源语言语音合成研究具有重要支撑作用。
当前挑战
该数据集致力于应对多语言语音合成领域的两大核心挑战:一是如何克服不同语言在音系、语法及正字法上的巨大差异,实现跨语言的统一声学建模;二是在构建过程中面临数据标注一致性难题,尤其是对于低资源语言,需处理文本标准化、音素转换及发音变异等复杂问题。此外,数据集的集成涉及多种来源的异构数据,在格式统一、质量控制和版权合规方面亦存在显著挑战。
常用场景
经典使用场景
在语音合成技术领域,多语言文本到语音转换模型的训练与评估构成了核心研究场景。Normalized-Multilingual-TTS数据集通过整合土耳其语、豪萨语、伊博语、皮钦语、约鲁巴语、中文、阿拉伯语、日语、韩语等数十种语言的标准化语音-文本对,为跨语言语音合成模型的统一训练提供了宝贵资源。该数据集典型应用于训练端到端的多语言TTS系统,支持研究者探索语言无关的声学建模与韵律生成,尤其在低资源语言的语音合成任务中展现出重要价值。
解决学术问题
该数据集有效缓解了多语言语音合成研究中数据稀缺与格式不统一的瓶颈问题。通过提供经过文本规范化、音素对齐和音频预处理的标准化数据,研究者能够系统探究跨语言音素映射、韵律迁移以及说话人身份保持等基础理论问题。其意义在于建立了可复现的多语言TTS基准测试环境,推动了语音合成领域从单一语言建模向多语言统一建模的范式转变,为语音技术在全球范围内的公平发展提供了数据基础设施。
实际应用
在实际应用层面,该数据集支撑了全球化智能语音产品的开发迭代。基于该数据训练的模型已应用于多语言虚拟助手、无障碍阅读工具、实时语音翻译系统等场景。例如在非洲地区,利用豪萨语、约鲁巴语子集开发的TTS系统助力了本地化教育资源的数字化;在东亚地区,中文方言子集为方言保护与传承提供了技术载体。这些应用显著降低了多语言语音服务的开发门槛,促进了语言技术的普惠性发展。
数据集最近研究
最新研究方向
在语音合成领域,多语言与低资源语言的研究正成为前沿焦点。Normalized-Multilingual-TTS数据集整合了土耳其语、豪萨语、约鲁巴语及中文方言等数十种语言的标准化语音文本对,为跨语言语音生成模型提供了统一训练基础。当前研究热点集中于利用此类规范化数据集,探索少样本迁移学习与语音风格转换技术,以应对全球语言多样性带来的数据稀缺挑战。这一方向不仅推动了语音技术的包容性发展,也为数字人文交流与无障碍通信奠定了关键数据基石。
以上内容由遇见数据集搜集并总结生成



