five

MediaParl

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/records/4288410
下载链接
链接失效反馈
官方服务:
资源简介:
Mediaparl is a Swiss accented bilingual database containing recordings in both French and German as they are spoken in Switzerland. The data were recorded at the Valais Parliament. Valais is a bi-lingual Swiss canton with many local accents and dialects. Therefore, the database contains data with high variability and is suitable to study multilingual, accented and non-native speech recognition as well as language identification and language switch detection. The corpus is partitioned into training, development and test sets. Since we focus on bilingual (accented, non-native) speech, the test set (MediaParl-TST) contains all the speakers who speak in both languages. The remaining speakers (non-bilingual) have been randomly assigned to the training (MediaParl-TRN) and development sets (MediaParl-DEV) in a proportion of 9 to 1. MediaParl-TRN contains 11,425 sentences (5,471 in French and 5,955 in German) spoken by 180 different speakers. MediaParl-DEV contains 1,525 sentences (646 in French and 879 in German) from 17 different speakers. MediaParl-TST contains 2,617 sentences (925 in french and 1692 in German) from 7 different speakers. Each speaker uses both languages but we assume that each speaker is naturally speaking more often in his mother tongue. Four speakers are native German speakers and three speakers native French speakers.   Reference paper MediaParl: Bilingual mixed language accented speech database, David Imseng, Hervé Bourlard, Holger Caesar, Philip N. Garner, Gwénolé Lecorvé and Alexandre Nanchen, in: Proceedings of the 2012 IEEE Workshop on Spoken Language Technology, 2012"
创建时间:
2023-03-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作