Data underlying the research between ASR and MT quality of Automatic Subtitling Platforms

4TU.ResearchData2023-10-18 更新2026-04-23 收录

下载链接：

https://data.4tu.nl/datasets/7cfa296a-72b7-4460-acd4-86193b43701e/1

下载链接

链接失效反馈

官方服务：

资源简介：

In the first experiment of ASR accuracy comparison, 1 set of speech-to-text data (hereafter Veed 0 and Iflyrec 0 ) is generated after submitting the “Qantas Safety video” on “Iflyrec” and “Veed”. The reference speech-to-text data is transcribed from Qantas’ official channel on YouTube.In the second experiment of automatic subtitling translation comparison, 3 sets of data are collected and analyzed. The author uses the original speech-to-text data of “Iflyrec” and “Veed” to generate one set of automatic subtitling translations (hereafter Veed 1 and Iflyrec 1), and then inputs the speech-to-text data on these two platforms to generate the final automatic subtitling translation version (hereafter Veed 2 and Iflyrec 2). For the human translation reference, this paper uses the translation from a tutor affiliated with the Civil Aviation University of China.

在自动语音识别（Automatic Speech Recognition, ASR）精度对比的首项实验中，研究人员在"Iflyrec"与"Veed"平台提交"澳航安全视频"后，生成了1组语音转文字数据（以下简称Veed₀与Iflyrec₀）。基准语音转文字数据源自YouTube平台上澳航官方频道的转录内容。在自动字幕翻译对比的第二项实验中，本研究共收集并分析了3组数据。本研究作者依托"Iflyrec"与"Veed"的原始语音转文字数据，生成了首批自动字幕翻译结果（以下简称Veed₁与Iflyrec₁）；随后将这两个平台的语音转文字数据作为输入，生成最终的自动字幕翻译版本（以下简称Veed₂与Iflyrec₂）。本实验采用隶属于中国民航大学的一名导师的翻译作为人工翻译参考基准。

创建时间：

2023-10-18

5,000+

优质数据集

54 个

任务类型

进入经典数据集