MuST-C
收藏arXiv2025-09-30 收录
下载链接:
https://ict.fbk.eu/must-c
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为MuST-C,是一个用于训练和测试同声传译模型的 多语言语音翻译数据集。其中,验证集包含了开发集(分别为英语到德语的1423对,以及英语到西班牙语的1316对);测试集则包含了tst-COMMON(分别为英语到德语的2641对,以及英语到西班牙语的2502对)。该数据集的规模涵盖了408小时(英语到德语)和504小时(英语到西班牙语)的语音数据,任务类型为同声传译。
The dataset named MuST-C is a multilingual speech translation dataset designed for training and testing simultaneous speech translation models. Its validation set comprises development subsets, with 1423 pairs for English-to-German and 1316 pairs for English-to-Spanish respectively. The test set includes the tst-COMMON subset, which contains 2641 pairs for English-to-German and 2502 pairs for English-to-Spanish respectively. The dataset spans 408 hours of speech data for English-to-German and 504 hours for English-to-Spanish, and its core task is simultaneous speech translation.
提供机构:
FBK
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



