CL-MASR
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8065753
下载链接
链接失效反馈官方服务:
资源简介:
CL-MASR Dataset
This is the dataset used in the continual learning for multilingual ASR (CL-MASR) benchmark. It is composed of speech recordings from 20 languages selected from the Common Voice 13 dataset. For each language, it includes up to 10/1/1 hours for train/dev/test, respectively.
The CL-MASR benchmark platform is available in the SpeechBrain toolkit (see recipes/CommonVoice):
https://github.com/speechbrain/speechbrain
The original Common Voice 13 data are available at:
https://commonvoice.mozilla.org/en/datasets
List of Languages
- English (en)
- Chinese (zh-CN)
- German (de)
- Spanish (es)
- Russian (ru)
- French (fr)
- Portuguese (pt)
- Japanese (ja)
- Turkish (tr)
- Polish (pl)
- Kinyarwanda (rw)
- Esperanto (eo)
- Kabyle (kab)
- Luganda (lg)
- Meadow Mari (mhr)
- Central Kurdish (ckb)
- Abkhaz (ab)
- Kurmanji Kurdish (kmr)
- Frisian (fy-NL)
- Interlingua (ia)
创建时间:
2023-06-28



