MuST-C Dataset
收藏paperswithcode.com2025-03-25 收录
下载链接:
https://paperswithcode.com/dataset/must-c
下载链接
链接失效反馈官方服务:
资源简介:
MuST-C currently represents the largest publicly available multilingual corpus (one-to-many) for speech translation. It covers eight language directions, from English to German, Spanish, French, Italian, Dutch, Portuguese, Romanian and Russian. The corpus consists of audio, transcriptions and translations of English TED talks, and it comes with a predefined training, validation and test split.
MuST-C 目前代表着目前可公开获取的最大规模的多语言语料库(一对一多),涵盖了从英语到德语、西班牙语、法语、意大利语、荷兰语、葡萄牙语、罗马尼亚语和俄语的八个语言方向。该语料库由英语 TED 讲座的音频、转录和翻译组成,并附带预定义的训练、验证和测试数据集划分。
提供机构:
paperswithcode.com
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



