DiaMoE-TTS_IPA_Trainingset

Name: DiaMoE-TTS_IPA_Trainingset
Creator: maas
Published: 2026-01-04 16:56:12
License: 暂无描述

魔搭社区2026-01-04 更新2025-12-06 收录

下载链接：

https://modelscope.cn/datasets/giantailab/DiaMoE-TTS_IPA_Trainingset

下载链接

链接失效反馈

官方服务：

资源简介：

# DiaMoE-TTS: A Unified IPA-based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation github:[DiaMoE-TTS](https://github.com/GiantAILab/DiaMoE-TTS) We utilize the [Common Voice Cantonese dataset](https://arxiv.org/abs/1912.06670), the [Emilia Mandarin dataset](https://arxiv.org/abs/2407.05361) and dialectal data from the [KeSpeech corpus](https://openreview.net/forum?id=b3Zoeq2sCLq) and a open-source [Sourthern Min dataset](https://sutian.moe.edu.tw/zh-hant/siongkuantsuguan/) for training. We only release the frontend of the open-source dataset IPA here. ## Short Intro Dialect speech embodies rich cultural and linguistic diversity, yet building text-to-speech (TTS) systems for dialects remains challenging due to scarce data, inconsistent orthographies, and complex phonetic variation. To address these issues, we present DiaMoE-TTS, a unified IPA-based framework that standardizes phonetic representations and resolves grapheme-to-phoneme ambiguities. Built upon the F5-TTS architecture, the system introduces a dialect-aware Mixture-of-Experts (MoE) to model phonological differences and employs parameter-efficient adaptation with Low-Rank Adaptors (LoRA) and Conditioning Adapters for rapid transfer to new dialects. Unlike approaches dependent on large-scale or proprietary resources, DiaMoE-TTS enables scalable, open-data-driven synthesis. Experiments demonstrate natural and expressive speech generation, achieving zero-shot performance on unseen dialects and specialized domains such as Peking Opera with only a few hours of data.

# DiaMoE-TTS：基于国际音标（IPA）的统一方言文本转语音框架，融合专家混合架构与参数高效零样本适配 github:[DiaMoE-TTS](https://github.com/GiantAILab/DiaMoE-TTS) 本研究使用[Common Voice粤语数据集](https://arxiv.org/abs/1912.06670)、[Emilia普通话数据集](https://arxiv.org/abs/2407.05361)、[KeSpeech语料库](https://openreview.net/forum?id=b3Zoeq2sCLq)中的方言数据，以及开源的[闽南语数据集](https://sutian.moe.edu.tw/zh-hant/siongkuantsuguan/)开展模型训练。本次开源仅公开该开源数据集的国际音标（IPA）前端处理部分。 ## 简短介绍方言语音承载着丰富的文化与语言多样性，但由于数据稀缺、正字法不统一以及复杂的音系变异，构建方言文本转语音（Text-to-Speech, TTS）系统仍面临诸多挑战。为解决上述问题，本研究提出DiaMoE-TTS框架——一款基于国际音标（IPA）的统一化方案，可标准化语音表征并解决字素-音素歧义问题。该框架基于F5-TTS架构构建，引入感知方言特性的专家混合（Mixture-of-Experts, MoE）模块以建模音系差异，并结合低秩适配器（Low-Rank Adaptors, LoRA）与条件适配器实现参数高效的适配，从而能够快速迁移至新的方言场景。与依赖大规模专有资源的现有方法不同，DiaMoE-TTS可实现基于开源数据的可扩展语音合成。实验结果表明，该框架可生成自然且富有表现力的语音，仅需数小时数据即可在未见方言以及京剧等专业领域实现零样本（zero-shot）适配效果。

提供机构：

maas

创建时间：

2025-11-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集