eSCAPE
收藏arXiv2018-03-20 更新2024-06-21 收录
下载链接:
http://hltshare.fbk.eu/QT21/eSCAPE.html
下载链接
链接失效反馈官方服务:
资源简介:
eSCAPE是由布鲁诺凯斯勒基金会创建的大型合成自动后编辑语料库,旨在解决机器翻译中自动修正模型训练数据不足的问题。该数据集包含1440万条英德和660万条英意语言对的数据,通过机器翻译公开可用的平行语料库的源侧并使用目标侧作为人工后编辑的近似来创建。数据集的创建过程涉及使用基于短语和神经的翻译模型,确保了数据的质量和多样性。eSCAPE的应用领域广泛,特别是在自动后编辑技术中,能够显著提高机器翻译输出的质量,减少人工后编辑的工作量,并适应特定应用领域的词汇和风格需求。
eSCAPE is a large-scale synthetic automatic post-editing corpus created by the Bruno Kessler Foundation, aiming to mitigate the shortage of training data for automatic machine translation correction models. This corpus contains 14.4 million English-German and 6.6 million English-Italian language pair datasets. It is developed by machine translating the source side of publicly available parallel corpora, with the target side serving as an approximation of human post-editing. The corpus construction process utilizes both phrase-based and neural machine translation models, ensuring the quality and diversity of the dataset. eSCAPE boasts a wide range of application scenarios, especially in automatic post-editing technologies, where it can significantly improve the quality of machine translation outputs, reduce the workload of manual post-editing, and adapt to the vocabulary and stylistic requirements of specific application domains.
提供机构:
布鲁诺凯斯勒基金会
创建时间:
2018-03-20



