TTS-Multilingual-Test-Set
收藏魔搭社区2026-01-06 更新2025-05-17 收录
下载链接:
https://modelscope.cn/datasets/MiniMax/TTS-Multilingual-Test-Set
下载链接
链接失效反馈官方服务:
资源简介:
## Overview
To assess the multilingual zero-shot voice cloning capabilities of TTS models, we have constructed a test set encompassing 24 languages. This dataset provides both audio samples for voice cloning and corresponding test texts.
Specifically, the test set for each language includes:
100 distinct test sentences.
Audio samples from two speakers (one male and one female) carefully selected from the Mozilla Common Voice (MCV) dataset, intended for voice cloning.
Researchers can clone the target voices using the provided audio samples and then synthesize the test texts. The resulting synthetic audio can then be evaluated for metrics such as Word Error Rate (WER) and speaker similarity(SIM), eg. [seed-tts-eval](https://github.com/BytedanceSpeech/seed-tts-eval).
## 24 Languages
Chinese, English, Cantonese, Japanese, Korean, Arabic, Spanish, Turkish, Indonesian, Portuguese, French, Italian, Dutch, Vietnamese, German, Russian, Ukrainian, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi.
## Data Format
The dataset is organized as follows:
```
├── speaker/...
│ Contains two audio files for each language (corresponding to male and female speakers).
│ Transcriptions for these audio files are also provided in prompt_text.txt.
└── text/...
Contains a test text file for each language.
Each line in the file follows the format: cloning_audio_filename|text_to_be_synthesized
Example (Korean): korean_female|내 나이 아홉살 처음 알게 된 사실이 잇었다. 생일 초를 끄기 전에는 소원을 빌어야 한다는 것.
(Here, korean_female refers to the corresponding Korean female speaker audio filename in the speaker/ directory, used for voice cloning.)
```
## Future Plans
We plan to expand this dataset in the future by adding more languages. Our goal is to establish it as a standard benchmark for evaluating and comparing multilingual TTS models.
Paper: https://huggingface.co/papers/2505.07916
Project page: https://minimax-ai.github.io/tts_tech_report
## Citation
```
@misc{minimax2025minimaxspeechintrinsiczeroshottexttospeech,
title={MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder},
author={Bowen Zhang, Congchao Guo, Geng Yang, Hang Yu, Haozhe Zhang, Heidi Lei, Jialong Mai, Junjie Yan, Kaiyue Yang, Mingqi Yang, Peikai Huang, Ruiyang Jin, Sitan Jiang, Weihua Cheng, Yawei Li, Yichen Xiao, Yiying Zhou, Yongmao Zhang, Yuan Lu, Yucen He},
year={2025},
eprint={2505.07916},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/2505.07916},
}
```
## 概述
为了评估文本到语音(Text-to-Speech, TTS)模型的多语言零样本(Zero-shot)语音克隆能力,我们构建了一个涵盖24种语言的测试集。本数据集同时提供了用于语音克隆的音频样本与对应的测试文本。
具体而言,每种语言的测试集包含100条独立的测试语句。
我们从 Mozilla 通用语音(Mozilla Common Voice, MCV)数据集中精心挑选了两名说话人(一名男性、一名女性)的音频样本,用于语音克隆任务。
研究人员可利用提供的音频样本克隆目标说话人声音,随后对测试文本进行语音合成。最终生成的合成音频可通过词错误率(Word Error Rate, WER)与说话人相似度(Speaker Similarity, SIM)等指标进行评估,例如 [seed-tts-eval](https://github.com/BytedanceSpeech/seed-tts-eval)。
## 24种语言
中文、英语、粤语、日语、韩语、阿拉伯语、西班牙语、土耳其语、印尼语、葡萄牙语、法语、意大利语、荷兰语、越南语、德语、俄语、乌克兰语、泰语、波兰语、罗马尼亚语、希腊语、捷克语、芬兰语、印地语。
## 数据格式
数据集组织形式如下:
├── speaker/...
│ 该目录下包含每种语言的两个音频文件(分别对应男性与女性说话人)。
│ 这些音频文件的转录文本已同步提供于 prompt_text.txt 中。
└── text/...
该目录下包含每种语言的测试文本文件。
文件内每行均遵循以下格式:克隆用音频文件名|待合成文本
示例(韩语):korean_female|내 나이 아홉살 처음 알게 된 사실이 잇었다. 생일 초를 끄기 전에는 소원을 빌어야 한다는 것.
(注:此处 korean_female 指 speaker/ 目录下对应的韩语女性说话人音频文件名,用于语音克隆任务。)
## 未来规划
我们计划未来通过新增更多语言对本数据集进行扩展,目标是将其打造为用于评估与对比多语言TTS模型的标准基准测试集。
论文:https://huggingface.co/papers/2505.07916
项目页面:https://minimax-ai.github.io/tts_tech_report
## 引用
@misc{minimax2025minimaxspeechintrinsiczeroshottexttospeech,
title={MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder},
author={Bowen Zhang, Congchao Guo, Geng Yang, Hang Yu, Haozhe Zhang, Heidi Lei, Jialong Mai, Junjie Yan, Kaiyue Yang, Mingqi Yang, Peikai Huang, Ruiyang Jin, Sitan Jiang, Weihua Cheng, Yawei Li, Yichen Xiao, Yiying Zhou, Yongmao Zhang, Yuan Lu, Yucen He},
year={2025},
eprint={2505.07916},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/2505.07916},
}
提供机构:
maas
创建时间:
2025-05-14
搜集汇总
数据集介绍

背景与挑战
背景概述
TTS-Multilingual-Test-Set是一个用于评估多语言零样本语音克隆能力的数据集,包含24种语言的音频样本和测试文本,每个语言提供两位说话者的音频和100个测试句子。数据集采用Apache License 2.0许可,旨在成为多语言TTS模型评估的标准基准。
以上内容由遇见数据集搜集并总结生成



