TR-AR和EN-AR语音对语音语料库
收藏arXiv2022-03-08 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2203.03601v1
下载链接
链接失效反馈官方服务:
资源简介:
本研究创建了名为TR-AR和EN-AR的语音对语音语料库,由美国贝鲁特大学和卡塔尔计算研究机构合作开发。该数据集包含17小时的配对语音片段,来源于土耳其语-阿拉伯语和英语-阿拉伯语的配音系列。创建过程涉及视频帧分析、语音识别、机器翻译和噪声帧移除等技术。数据集的应用领域主要集中在语音对语音翻译系统的开发,旨在解决现有语音语料库资源有限的问题。
This study created two speech-to-speech corpora designated as TR-AR and EN-AR, which were jointly developed by the American University of Beirut and the Qatar Computing Research Institute. The dataset comprises 17 hours of paired speech segments sourced from Turkish-Arabic and English-Arabic dubbed television series. The corpus development process involves technologies including video frame analysis, speech recognition, machine translation, and noisy frame removal. Its primary application focuses on the development of speech-to-speech translation systems, aiming to address the scarcity of existing speech corpus resources.
提供机构:
美国贝鲁特大学计算机科学系
创建时间:
2022-03-08



