five

CoVoST

收藏
arXiv2020-06-10 更新2024-06-21 收录
下载链接:
https://github.com/facebookresearch/covost
下载链接
链接失效反馈
官方服务:
资源简介:
CoVoST是由Facebook AI开发的多语种语音到文本翻译数据集,涵盖11种语言,包括法语、德语、荷兰语、俄语、西班牙语、意大利语、土耳其语、波斯语、瑞典语、蒙古语和中文。数据集通过众包方式收集,包含超过11,000名发言者和60多种口音,总时长达到708小时。创建过程中,通过专业翻译确保数据质量,并采用多种质量控制措施。CoVoST的应用领域主要集中在语音翻译技术,特别是端到端模型的研究和开发,旨在解决多语种语音翻译中的性能和多样性问题。

CoVoST is a multilingual speech-to-text translation dataset developed by Facebook AI. It covers 11 languages, including French, German, Dutch, Russian, Spanish, Italian, Turkish, Persian, Swedish, Mongolian and Mandarin Chinese. The dataset is collected via crowdsourcing, involving over 11,000 speakers and more than 60 distinct accents, with a total duration of 708 hours. During its development, professional translation was utilized to guarantee data quality, and multiple quality control measures were adopted. CoVoST is mainly applied in the field of speech translation technology, especially the research and development of end-to-end models, with the goal of addressing the performance and diversity challenges in multilingual speech translation.
提供机构:
Facebook AI
创建时间:
2020-02-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作