Trelis/tricky-tts-public
收藏Hugging Face2026-03-31 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/Trelis/tricky-tts-public
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
tags:
- tts
- text-to-speech
- evaluation
- benchmark
- english
language:
- en
---
# Tricky TTS
A benchmark dataset for evaluating text-to-speech (TTS) models on linguistically and
typographically challenging English text. Each row is designed to stress-test a specific
failure mode that separates capable TTS systems from weaker ones.
## Built with Trelis Studio
Evaluations were run using [Trelis Studio](https://studio.trelis.com). For custom voice model development, see [Trelis Voice AI Services](https://trelis.com/voice-ai-services/).
## Evaluation methodology
- **Round-trip ASR CER**: TTS model generates audio → Whisper transcribes back → CER vs human reference
- **MOS (naturalness)**: UTMOS score on generated audio
## Dataset
4 rows covering four challenge categories:
| Category | What it tests |
|---|---|
| `symbol_expansion` | Unicode symbols, units, operators — `≥`, `μL`, `±`, `×10⁶` |
| `abbreviation_reading` | Acronyms, initialisms, roman numerals, dotted titles — `IEEE`, `Vol. XII`, `F.A.C.C.` |
| `proper_nouns` | Irish/Celtic names, HuggingFace model paths, brand names |
| `prosody_and_punctuation` | Em-dashes, ellipses, onomatopoeia, rhythm — `zzz`, `Psst`, `whoosh` |
Columns: `text`, `category`, `spoken_form` (normalised reference transcription), `reference_audio` (human voice recording, webm), `reference_asr` (transcription of reference audio by `openai/whisper-large-v3` via Trelis Studio ASR eval).
## Usage
```python
from datasets import load_dataset
ds = load_dataset("Trelis/tricky-tts-public", split="train")
for row in ds:
print(row["category"], row["text"])
```
## Leaderboard
Evaluated with round-trip ASR (Whisper large-v3 human reference, `fireworks/whisper-v3` scoring).
MOS from UTMOS. Human reference audio scored at 4.22 MOS.
| Rank | Model | MOS ↑ | CER ↓ | Eval dataset |
|---|---|---|---|---|
| 1 | Gemini Pro TTS | 4.227 | 0.112 | [Trelis/tricky-tts-gemini-pro-tts](https://huggingface.co/datasets/Trelis/tricky-tts-gemini-pro-tts) |
| 2 | GPT-4o mini TTS | 4.330 | 0.121 | [Trelis/tricky-tts-gpt-4o-mini-tts](https://huggingface.co/datasets/Trelis/tricky-tts-gpt-4o-mini-tts) |
| 3 | Gemini Flash TTS | 4.184 | 0.122 | [Trelis/tricky-tts-gemini-flash-tts](https://huggingface.co/datasets/Trelis/tricky-tts-gemini-flash-tts) |
| 4 | ElevenLabs | 4.273 | 0.192 | [Trelis/tricky-tts-elevenlabs](https://huggingface.co/datasets/Trelis/tricky-tts-elevenlabs) |
| 5 | Kokoro | 4.511 | 0.209 | [Trelis/tricky-tts-kokoro](https://huggingface.co/datasets/Trelis/tricky-tts-kokoro) |
| 6 | Orpheus | 4.152 | 0.229 | [Trelis/tricky-tts-orpheus](https://huggingface.co/datasets/Trelis/tricky-tts-orpheus) |
| 7 | Cartesia Sonic-3 | 4.019 | 0.259 | [Trelis/tricky-tts-cartesia-sonic-3](https://huggingface.co/datasets/Trelis/tricky-tts-cartesia-sonic-3) |
| 8 | Piper (en-gb) | 3.777 | 0.323 | [Trelis/tricky-tts-piper-en-gb](https://huggingface.co/datasets/Trelis/tricky-tts-piper-en-gb) |
| 9 | Mistral Voxtral-Mini | 4.289 | 0.569 | [Trelis/tricky-tts-mistral](https://huggingface.co/datasets/Trelis/tricky-tts-mistral) |
| 10 | Chatterbox | 4.100 | 0.583 | [Trelis/tricky-tts-chatterbox](https://huggingface.co/datasets/Trelis/tricky-tts-chatterbox) |
## License
MIT
---
许可证:MIT
标签:
- 文本转语音(text-to-speech, TTS)
- 评估
- 基准测试
- 英语
语言:
- 英语
---
# 挑战性TTS基准数据集(Tricky TTS)
本数据集专为评估文本转语音(text-to-speech, TTS)模型而设计,聚焦于语言学与排版层面均具备挑战性的英语文本。每一条数据均旨在针对性施压测试某一类失效模式,以此区分性能优异与偏弱的TTS系统。
## 基于Trelis Studio构建
本次评估基于[Trelis Studio](https://studio.trelis.com)完成。如需定制语音模型开发服务,请参阅[Trelis语音人工智能服务](https://trelis.com/voice-ai-services/)。
## 评估方法
- **循环语音识别字符错误率(Round-trip ASR CER)**:TTS模型生成音频 → 经Whisper模型转写为文本 → 与人工参考文本对比计算字符错误率(CER)
- **自然度平均意见得分(MOS, naturalness)**:基于生成音频的UTMOS评分
## 数据集概况
本数据集共包含4条数据,覆盖四大挑战类别:
| 类别 | 测试内容 |
|---|---|
| `symbol_expansion` | Unicode符号、单位、运算符——`≥`、`μL`、`±`、`×10⁶` |
| `abbreviation_reading` | 首字母缩略词、首字母拼音词、罗马数字、带点标题——`IEEE`、`Vol. XII`、`F.A.C.C.` |
| `proper_nouns` | 爱尔兰/凯尔特人名、HuggingFace模型路径、品牌名称 |
| `prosody_and_punctuation` | 破折号、省略号、拟声词、节奏——`zzz`、`Psst`、`whoosh` |
数据集字段包括:`text`(原始待转写文本)、`category`(挑战类别)、`spoken_form`(标准化参考转写文本)、`reference_audio`(人工语音录音,格式为webm)、`reference_asr`(通过Trelis Studio ASR评估,由`openai/whisper-large-v3`对参考音频生成的转写文本)
## 使用方法
python
from datasets import load_dataset
ds = load_dataset("Trelis/tricky-tts-public", split="train")
for row in ds:
print(row["category"], row["text"])
## 排行榜
本次排行榜采用循环语音识别方式评估(以Whisper large-v3作为人工参考、`fireworks/whisper-v3`作为评分模型),自然度得分采用UTMOS评分。人工参考音频的得分为4.22 MOS。
| 排名 | 模型 | MOS ↑ | CER ↓ | 评估数据集 |
|---|---|---|---|---|
| 1 | Gemini Pro TTS | 4.227 | 0.112 | [Trelis/tricky-tts-gemini-pro-tts](https://huggingface.co/datasets/Trelis/tricky-tts-gemini-pro-tts) |
| 2 | GPT-4o mini TTS | 4.330 | 0.121 | [Trelis/tricky-tts-gpt-4o-mini-tts](https://huggingface.co/datasets/Trelis/tricky-tts-gpt-4o-mini-tts) |
| 3 | Gemini Flash TTS | 4.184 | 0.122 | [Trelis/tricky-tts-gemini-flash-tts](https://huggingface.co/datasets/Trelis/tricky-tts-gemini-flash-tts) |
| 4 | ElevenLabs | 4.273 | 0.192 | [Trelis/tricky-tts-elevenlabs](https://huggingface.co/datasets/Trelis/tricky-tts-elevenlabs) |
| 5 | Kokoro | 4.511 | 0.209 | [Trelis/tricky-tts-kokoro](https://huggingface.co/datasets/Trelis/tricky-tts-kokoro) |
| 6 | Orpheus | 4.152 | 0.229 | [Trelis/tricky-tts-orpheus](https://huggingface.co/datasets/Trelis/tricky-tts-orpheus) |
| 7 | Cartesia Sonic-3 | 4.019 | 0.259 | [Trelis/tricky-tts-cartesia-sonic-3](https://huggingface.co/datasets/Trelis/tricky-tts-cartesia-sonic-3) |
| 8 | Piper (en-gb) | 3.777 | 0.323 | [Trelis/tricky-tts-piper-en-gb](https://huggingface.co/datasets/Trelis/tricky-tts-piper-en-gb) |
| 9 | Mistral Voxtral-Mini | 4.289 | 0.569 | [Trelis/tricky-tts-mistral](https://huggingface.co/datasets/Trelis/tricky-tts-mistral) |
| 10 | Chatterbox | 4.100 | 0.583 | [Trelis/tricky-tts-chatterbox](https://huggingface.co/datasets/Trelis/tricky-tts-chatterbox) |
## 许可证
MIT
提供机构:
Trelis



