five

DiaMoE-TTS_IPA_Trainingset

收藏
魔搭社区2026-01-04 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/giantailab/DiaMoE-TTS_IPA_Trainingset
下载链接
链接失效反馈
官方服务:
资源简介:
# DiaMoE-TTS: A Unified IPA-based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation github:[DiaMoE-TTS](https://github.com/GiantAILab/DiaMoE-TTS) We utilize the [Common Voice Cantonese dataset](https://arxiv.org/abs/1912.06670), the [Emilia Mandarin dataset](https://arxiv.org/abs/2407.05361) and dialectal data from the [KeSpeech corpus](https://openreview.net/forum?id=b3Zoeq2sCLq) and a open-source [Sourthern Min dataset](https://sutian.moe.edu.tw/zh-hant/siongkuantsuguan/) for training. We only release the frontend of the open-source dataset IPA here. ## Short Intro Dialect speech embodies rich cultural and linguistic diversity, yet building text-to-speech (TTS) systems for dialects remains challenging due to scarce data, inconsistent orthographies, and complex phonetic variation. To address these issues, we present DiaMoE-TTS, a unified IPA-based framework that standardizes phonetic representations and resolves grapheme-to-phoneme ambiguities. Built upon the F5-TTS architecture, the system introduces a dialect-aware Mixture-of-Experts (MoE) to model phonological differences and employs parameter-efficient adaptation with Low-Rank Adaptors (LoRA) and Conditioning Adapters for rapid transfer to new dialects. Unlike approaches dependent on large-scale or proprietary resources, DiaMoE-TTS enables scalable, open-data-driven synthesis. Experiments demonstrate natural and expressive speech generation, achieving zero-shot performance on unseen dialects and specialized domains such as Peking Opera with only a few hours of data.

# DiaMoE-TTS:基于国际音标(IPA)的统一方言文本转语音框架,融合专家混合架构与参数高效零样本适配 github:[DiaMoE-TTS](https://github.com/GiantAILab/DiaMoE-TTS) 本研究使用[Common Voice粤语数据集](https://arxiv.org/abs/1912.06670)、[Emilia普通话数据集](https://arxiv.org/abs/2407.05361)、[KeSpeech语料库](https://openreview.net/forum?id=b3Zoeq2sCLq)中的方言数据,以及开源的[闽南语数据集](https://sutian.moe.edu.tw/zh-hant/siongkuantsuguan/)开展模型训练。本次开源仅公开该开源数据集的国际音标(IPA)前端处理部分。 ## 简短介绍 方言语音承载着丰富的文化与语言多样性,但由于数据稀缺、正字法不统一以及复杂的音系变异,构建方言文本转语音(Text-to-Speech, TTS)系统仍面临诸多挑战。为解决上述问题,本研究提出DiaMoE-TTS框架——一款基于国际音标(IPA)的统一化方案,可标准化语音表征并解决字素-音素歧义问题。该框架基于F5-TTS架构构建,引入感知方言特性的专家混合(Mixture-of-Experts, MoE)模块以建模音系差异,并结合低秩适配器(Low-Rank Adaptors, LoRA)与条件适配器实现参数高效的适配,从而能够快速迁移至新的方言场景。与依赖大规模专有资源的现有方法不同,DiaMoE-TTS可实现基于开源数据的可扩展语音合成。实验结果表明,该框架可生成自然且富有表现力的语音,仅需数小时数据即可在未见方言以及京剧等专业领域实现零样本(zero-shot)适配效果。
提供机构:
maas
创建时间:
2025-11-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作