five

Trelis/medical-terms-2025

收藏
Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Trelis/medical-terms-2025
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - automatic-speech-recognition language: - en tags: - medical - asr - entity-cer - benchmark - tts size_categories: - n<1K --- # Medical Terms 2025 — Medical ASR Benchmark Entity-aware medical ASR benchmark — 50 hard rows with synthetic TTS audio of 2025 drug/condition terminology. Prepared by Trelis Research. Watch more on [Youtube](https://youtube.com/@TrelisResearch) or inquire about our custom voice AI (ASR/TTS) services [here](https://trelis.com/voice-ai-services). ## Source 84 manually curated terms from 2025 FDA/EMA/WHO primary sources. Each term has `source_url`, `source_date`, and `source_quality`. Sentences generated by Gemini 2.5 Flash. Audio by Kokoro TTS via [Trelis Studio](https://studio.trelis.com) (9 voices, round-robin). ## Preparation 1. 1 sentence per term (Gemini 2.5 Flash) 2. Kokoro TTS (9 voices) 3. String-search entity tagging 4. 3-model difficulty filter (whisper-large-v3, canary-1b-v2, Voxtral-Mini) with whisper-english normalization 5. Top-50 by median entity CER ## Entity categories - **drug** — drug or medication names (brand or INN) - **condition** — diagnoses, diseases, syndromes, disorders - **procedure** — surgical, diagnostic, or therapeutic procedures - **anatomy** — anatomical structures, organs, body regions - **biomarker** — lab tests, biomarkers, genes, proteins, molecular markers - **organisation** — hospitals, regulatory bodies, pharmaceutical companies ## Columns - `audio` — 24kHz WAV (Kokoro TTS) - `text` — ground truth sentence - `keyword` — the target medical term - `category` — entity category - `voice` — Kokoro voice ID - `entities` — JSON array of tagged medical entities - `difficulty_rank` — 1 = hardest - `median_entity_cer` — median entity CER across 3 difficulty-filter models ## Leaderboard (16 models, sorted by Entity CER) | # | Model | WER | CER | Entity CER | Results | |---|---|---|---|---|---| | 1 | gemini-2.5-pro | 0.048 | 0.018 | 0.138 | [results](https://huggingface.co/datasets/Trelis/eval-gemini-2.5-pro-medical-terms-2025-20260408-1927) | | 2 | scribe-v2 | 0.062 | 0.023 | 0.172 | [results](https://huggingface.co/datasets/Trelis/eval-scribe-v2-medical-terms-2025-20260408-1929) | | 3 | universal-3-pro | 0.069 | 0.024 | 0.181 | [results](https://huggingface.co/datasets/Trelis/eval-universal-3-pro-medical-terms-2025-20260408-1928) | | 4 | nova-3 | 0.080 | 0.025 | 0.183 | [results](https://huggingface.co/datasets/Trelis/eval-nova-3-medical-terms-2025-20260408-1929) | | 5 | whisper-v3 (fireworks) | 0.090 | 0.030 | 0.198 | [results](https://huggingface.co/datasets/Trelis/eval-whisper-v3-medical-terms-2025-20260408-1933) | | 6 | whisper-large-v3 | 0.089 | 0.028 | 0.200 | [results](https://huggingface.co/datasets/Trelis/eval-whisper-large-v3-medical-terms-2025-20260408-1926) | | 7 | canary-1b-v2 | 0.101 | 0.033 | 0.211 | [results](https://huggingface.co/datasets/Trelis/eval-canary-1b-v2-medical-terms-2025-20260408-1926) | | 8 | whisper-large-v3-turbo | 0.094 | 0.032 | 0.227 | [results](https://huggingface.co/datasets/Trelis/eval-whisper-large-v3-turbo-medical-terms-2025-20260408-1926) | | 9 | ursa-2-enhanced | 0.060 | 0.033 | 0.233 | [results](https://huggingface.co/datasets/Trelis/eval-ursa-2-enhanced-medical-terms-2025-20260408-1928) | | 10 | Voxtral-Mini-3B-2507 | 0.081 | 0.034 | 0.237 | [results](https://huggingface.co/datasets/Trelis/eval-Voxtral-Mini-3B-2507-medical-terms-2025-20260408-1926) | | 11 | parakeet-tdt-0.6b-v3 | 0.113 | 0.036 | 0.246 | [results](https://huggingface.co/datasets/Trelis/eval-parakeet-tdt-0.6b-v3-medical-terms-2025-20260408-1926) | | 12 | MultiMed-ST (whisper-small-en) | 0.134 | 0.049 | 0.259 | [results](https://huggingface.co/datasets/Trelis/eval-whisper-small-english-medical-terms-2025-20260408-1931) | | 13 | whisper-small | 0.129 | 0.042 | 0.264 | [results](https://huggingface.co/datasets/Trelis/eval-whisper-small-medical-terms-2025-20260408-1929) | | 14 | whisper-base | 0.179 | 0.055 | 0.275 | [results](https://huggingface.co/datasets/Trelis/eval-whisper-base-medical-terms-2025-20260408-1925) | | 15 | whisper-tiny | 0.207 | 0.063 | 0.309 | [results](https://huggingface.co/datasets/Trelis/eval-whisper-tiny-medical-terms-2025-20260408-1925) | | 16 | medasr | 0.150 | 0.058 | 0.323 | [results](https://huggingface.co/datasets/Trelis/eval-medasr-medical-terms-2025-20260409-1107) | Evaluated with [Trelis Studio](https://studio.trelis.com), whisper-english normalization.
提供机构:
Trelis
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作