five

Trelis/tricky-tts-public

收藏
Hugging Face2026-03-31 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/Trelis/tricky-tts-public
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit tags: - tts - text-to-speech - evaluation - benchmark - english language: - en --- # Tricky TTS A benchmark dataset for evaluating text-to-speech (TTS) models on linguistically and typographically challenging English text. Each row is designed to stress-test a specific failure mode that separates capable TTS systems from weaker ones. ## Built with Trelis Studio Evaluations were run using [Trelis Studio](https://studio.trelis.com). For custom voice model development, see [Trelis Voice AI Services](https://trelis.com/voice-ai-services/). ## Evaluation methodology - **Round-trip ASR CER**: TTS model generates audio → Whisper transcribes back → CER vs human reference - **MOS (naturalness)**: UTMOS score on generated audio ## Dataset 4 rows covering four challenge categories: | Category | What it tests | |---|---| | `symbol_expansion` | Unicode symbols, units, operators — `≥`, `μL`, `±`, `×10⁶` | | `abbreviation_reading` | Acronyms, initialisms, roman numerals, dotted titles — `IEEE`, `Vol. XII`, `F.A.C.C.` | | `proper_nouns` | Irish/Celtic names, HuggingFace model paths, brand names | | `prosody_and_punctuation` | Em-dashes, ellipses, onomatopoeia, rhythm — `zzz`, `Psst`, `whoosh` | Columns: `text`, `category`, `spoken_form` (normalised reference transcription), `reference_audio` (human voice recording, webm), `reference_asr` (transcription of reference audio by `openai/whisper-large-v3` via Trelis Studio ASR eval). ## Usage ```python from datasets import load_dataset ds = load_dataset("Trelis/tricky-tts-public", split="train") for row in ds: print(row["category"], row["text"]) ``` ## Leaderboard Evaluated with round-trip ASR (Whisper large-v3 human reference, `fireworks/whisper-v3` scoring). MOS from UTMOS. Human reference audio scored at 4.22 MOS. | Rank | Model | MOS ↑ | CER ↓ | Eval dataset | |---|---|---|---|---| | 1 | Gemini Pro TTS | 4.227 | 0.112 | [Trelis/tricky-tts-gemini-pro-tts](https://huggingface.co/datasets/Trelis/tricky-tts-gemini-pro-tts) | | 2 | GPT-4o mini TTS | 4.330 | 0.121 | [Trelis/tricky-tts-gpt-4o-mini-tts](https://huggingface.co/datasets/Trelis/tricky-tts-gpt-4o-mini-tts) | | 3 | Gemini Flash TTS | 4.184 | 0.122 | [Trelis/tricky-tts-gemini-flash-tts](https://huggingface.co/datasets/Trelis/tricky-tts-gemini-flash-tts) | | 4 | ElevenLabs | 4.273 | 0.192 | [Trelis/tricky-tts-elevenlabs](https://huggingface.co/datasets/Trelis/tricky-tts-elevenlabs) | | 5 | Kokoro | 4.511 | 0.209 | [Trelis/tricky-tts-kokoro](https://huggingface.co/datasets/Trelis/tricky-tts-kokoro) | | 6 | Orpheus | 4.152 | 0.229 | [Trelis/tricky-tts-orpheus](https://huggingface.co/datasets/Trelis/tricky-tts-orpheus) | | 7 | Cartesia Sonic-3 | 4.019 | 0.259 | [Trelis/tricky-tts-cartesia-sonic-3](https://huggingface.co/datasets/Trelis/tricky-tts-cartesia-sonic-3) | | 8 | Piper (en-gb) | 3.777 | 0.323 | [Trelis/tricky-tts-piper-en-gb](https://huggingface.co/datasets/Trelis/tricky-tts-piper-en-gb) | | 9 | Mistral Voxtral-Mini | 4.289 | 0.569 | [Trelis/tricky-tts-mistral](https://huggingface.co/datasets/Trelis/tricky-tts-mistral) | | 10 | Chatterbox | 4.100 | 0.583 | [Trelis/tricky-tts-chatterbox](https://huggingface.co/datasets/Trelis/tricky-tts-chatterbox) | ## License MIT

--- 许可证:MIT 标签: - 文本转语音(text-to-speech, TTS) - 评估 - 基准测试 - 英语 语言: - 英语 --- # 挑战性TTS基准数据集(Tricky TTS) 本数据集专为评估文本转语音(text-to-speech, TTS)模型而设计,聚焦于语言学与排版层面均具备挑战性的英语文本。每一条数据均旨在针对性施压测试某一类失效模式,以此区分性能优异与偏弱的TTS系统。 ## 基于Trelis Studio构建 本次评估基于[Trelis Studio](https://studio.trelis.com)完成。如需定制语音模型开发服务,请参阅[Trelis语音人工智能服务](https://trelis.com/voice-ai-services/)。 ## 评估方法 - **循环语音识别字符错误率(Round-trip ASR CER)**:TTS模型生成音频 → 经Whisper模型转写为文本 → 与人工参考文本对比计算字符错误率(CER) - **自然度平均意见得分(MOS, naturalness)**:基于生成音频的UTMOS评分 ## 数据集概况 本数据集共包含4条数据,覆盖四大挑战类别: | 类别 | 测试内容 | |---|---| | `symbol_expansion` | Unicode符号、单位、运算符——`≥`、`μL`、`±`、`×10⁶` | | `abbreviation_reading` | 首字母缩略词、首字母拼音词、罗马数字、带点标题——`IEEE`、`Vol. XII`、`F.A.C.C.` | | `proper_nouns` | 爱尔兰/凯尔特人名、HuggingFace模型路径、品牌名称 | | `prosody_and_punctuation` | 破折号、省略号、拟声词、节奏——`zzz`、`Psst`、`whoosh` | 数据集字段包括:`text`(原始待转写文本)、`category`(挑战类别)、`spoken_form`(标准化参考转写文本)、`reference_audio`(人工语音录音,格式为webm)、`reference_asr`(通过Trelis Studio ASR评估,由`openai/whisper-large-v3`对参考音频生成的转写文本) ## 使用方法 python from datasets import load_dataset ds = load_dataset("Trelis/tricky-tts-public", split="train") for row in ds: print(row["category"], row["text"]) ## 排行榜 本次排行榜采用循环语音识别方式评估(以Whisper large-v3作为人工参考、`fireworks/whisper-v3`作为评分模型),自然度得分采用UTMOS评分。人工参考音频的得分为4.22 MOS。 | 排名 | 模型 | MOS ↑ | CER ↓ | 评估数据集 | |---|---|---|---|---| | 1 | Gemini Pro TTS | 4.227 | 0.112 | [Trelis/tricky-tts-gemini-pro-tts](https://huggingface.co/datasets/Trelis/tricky-tts-gemini-pro-tts) | | 2 | GPT-4o mini TTS | 4.330 | 0.121 | [Trelis/tricky-tts-gpt-4o-mini-tts](https://huggingface.co/datasets/Trelis/tricky-tts-gpt-4o-mini-tts) | | 3 | Gemini Flash TTS | 4.184 | 0.122 | [Trelis/tricky-tts-gemini-flash-tts](https://huggingface.co/datasets/Trelis/tricky-tts-gemini-flash-tts) | | 4 | ElevenLabs | 4.273 | 0.192 | [Trelis/tricky-tts-elevenlabs](https://huggingface.co/datasets/Trelis/tricky-tts-elevenlabs) | | 5 | Kokoro | 4.511 | 0.209 | [Trelis/tricky-tts-kokoro](https://huggingface.co/datasets/Trelis/tricky-tts-kokoro) | | 6 | Orpheus | 4.152 | 0.229 | [Trelis/tricky-tts-orpheus](https://huggingface.co/datasets/Trelis/tricky-tts-orpheus) | | 7 | Cartesia Sonic-3 | 4.019 | 0.259 | [Trelis/tricky-tts-cartesia-sonic-3](https://huggingface.co/datasets/Trelis/tricky-tts-cartesia-sonic-3) | | 8 | Piper (en-gb) | 3.777 | 0.323 | [Trelis/tricky-tts-piper-en-gb](https://huggingface.co/datasets/Trelis/tricky-tts-piper-en-gb) | | 9 | Mistral Voxtral-Mini | 4.289 | 0.569 | [Trelis/tricky-tts-mistral](https://huggingface.co/datasets/Trelis/tricky-tts-mistral) | | 10 | Chatterbox | 4.100 | 0.583 | [Trelis/tricky-tts-chatterbox](https://huggingface.co/datasets/Trelis/tricky-tts-chatterbox) | ## 许可证 MIT
提供机构:
Trelis
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作