An Arabic-Hebrew parallel corpus of TED talks
收藏arXiv2016-10-03 更新2024-07-18 收录
下载链接:
https://wit3.fbk.eu/2016-01
下载链接
链接失效反馈官方服务:
资源简介:
本数据集名为‘An Arabic-Hebrew parallel corpus of TED talks’,由意大利特伦托FBK创建。数据集包含约2000个TED演讲的阿拉伯语和希伯来语平行文本,总计约3.5M tokens每种语言。创建过程中,利用WIT3收集原始数据,并通过特定的算法对文本进行对齐和重构。该数据集主要应用于机器翻译领域,旨在解决阿拉伯语和希伯来语之间的翻译问题。
This dataset is named "An Arabic-Hebrew parallel corpus of TED talks", and was created by FBK in Trento, Italy. It comprises approximately 2,000 parallel Arabic and Hebrew text pairs from TED Talks, with roughly 3.5 million tokens per language. During its development, raw data was collected using WIT3, and the texts were aligned and reconstructed via specialized algorithms. This dataset is primarily employed in the field of machine translation, aiming to address translation challenges between Arabic and Hebrew.
提供机构:
FBK, Trento, Italy
创建时间:
2016-10-03



