Databoost/TTS_Multilingual_Data
收藏Hugging Face2025-02-11 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/Databoost/TTS_Multilingual_Data
下载链接
链接失效反馈官方服务:
资源简介:
TTS_Multilingual_Data是一个大规模的多语言语料库,设计用于语言分析和语音处理模型的发展。它支持文本到语音(TTS)、自动语音识别(ASR)和说话人识别等任务。该数据集以Parquet格式组织,是训练和评估模型的关键资源,使用了针对ASR和语音技术的定制指标。数据集分为演讲与会议、对话与访谈、媒体内容与娱乐、指令与语音助手、非正式语言与常用表达、无障碍与包容性、文学与文化等主题类别。
TTS_Multilingual_Data is a large-scale multilingual corpus designed for linguistic analysis and the development of speech processing models. It supports tasks such as Text-to-Speech (TTS), Automatic Speech Recognition (ASR), and speaker identification. The dataset is organized in Parquet format and serves as a key resource for training and evaluating models, using metrics tailored to ASR and speech technologies. The dataset is categorized into thematic areas such as Speeches & Conferences, Conversations & Dialogues, Media Content & Entertainment, Instructions & Voice Assistants, Informal Language & Common Expressions, Accessibility & Inclusion, and Literature & Culture.
提供机构:
Databoost



