Europarl-ST
收藏arXiv2020-02-12 更新2024-07-25 收录
下载链接:
https://www.mllp.upv.es/europarl-st/
下载链接
链接失效反馈官方服务:
资源简介:
Europarl-ST数据集由瓦伦西亚理工大学人工智能研究所的MLLP研究组创建,是一个多语言语音翻译语料库,包含2008年至2012年间欧洲议会辩论的音频-文本样本,涵盖6种欧洲语言,共30个翻译方向。数据集通过公开的欧洲议会辩论视频创建,经过音频与文本的对齐和过滤处理,确保数据质量。该数据集适用于自动语音识别、机器翻译和语音翻译等领域的研究,旨在解决多语言语音翻译资源不足的问题,推动相关技术的发展。
The Europarl-ST dataset was created by the MLLP Research Group at the Artificial Intelligence Institute of Universitat Politècnica de València. It is a multilingual speech translation corpus consisting of audio-text samples from European Parliament debates conducted between 2008 and 2012, covering 6 European languages and encompassing 30 translation directions. The dataset is built from publicly accessible European Parliament debate videos, and has undergone audio-text alignment and filtering processing to guarantee data quality. This dataset is suitable for research in fields including automatic speech recognition, machine translation and speech translation, with the goal of addressing the shortage of multilingual speech translation resources and promoting the advancement of relevant technologies.
提供机构:
瓦伦西亚理工大学人工智能研究所
创建时间:
2019-11-08



