five

Data underlying the research between ASR and MT quality of Automatic Subtitling Platforms

收藏
DataCite Commons2023-10-18 更新2024-07-03 收录
下载链接:
https://data.4tu.nl/datasets/7cfa296a-72b7-4460-acd4-86193b43701e/1
下载链接
链接失效反馈
官方服务:
资源简介:
In the first experiment of ASR accuracy comparison, 1 set of speech-to-text data (hereafter Veed<sub> 0</sub> and Iflyrec<sub> 0</sub> ) is generated after submitting the “Qantas Safety video” on “Iflyrec” and “Veed”. The reference speech-to-text data is transcribed from Qantas’ official channel on YouTube.In the second experiment of automatic subtitling translation comparison, 3 sets of data are collected and analyzed. The author uses the original speech-to-text data of “Iflyrec” and “Veed” to generate one set of automatic subtitling translations (hereafter Veed<sub> 1</sub> and Iflyrec<sub> 1</sub>), and then inputs the speech-to-text data on these two platforms to generate the final automatic subtitling translation version (hereafter Veed<sub> 2</sub> and Iflyrec<sub> 2</sub>). For the human translation reference, this paper uses the translation from a tutor affiliated with the Civil Aviation University of China.

在自动语音识别(Automatic Speech Recognition,ASR)准确率对比的第一项实验中,作者在「讯飞听见(Iflyrec)」与「Veed」平台提交《澳航安全宣传片》后,生成了1组语音转文字(speech-to-text)数据(以下记为Veed₀与Iflyrec₀)。基准语音转文字数据集源自YouTube平台上澳航官方频道发布内容的转录结果。 在自动字幕翻译(automatic subtitling translation)对比的第二项实验中,本研究共收集并分析了3组数据。作者依托「讯飞听见(Iflyrec)」与「Veed」生成的原始语音转文字数据,生成了1组自动字幕翻译结果(以下记为Veed₁与Iflyrec₁);随后将这两个平台的语音转文字数据输入对应系统,生成了最终自动字幕翻译版本(以下记为Veed₂与Iflyrec₂)。针对人工翻译基准,本文选用了中国民航大学一名教师完成的翻译内容。
提供机构:
4TU.ResearchData
创建时间:
2023-10-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作