UN Parallel Corpus Annotated for Translation Direction
收藏arXiv2018-05-20 更新2024-07-25 收录
下载链接:
shuly@cs.haifa.ac.il
下载链接
链接失效反馈官方服务:
资源简介:
UN Parallel Corpus Annotated for Translation Direction是由海法大学创建的一个用于区分翻译文本和原始文本的数据集。该数据集包含从英语到其他五种联合国官方语言(法语、西班牙语、俄语、阿拉伯语和中文)的双语平行语料,总计约892万条有效句子。数据集通过特定的目录结构和链接文件来确定翻译方向,并使用机器学习方法进行分类。该数据集主要用于语言学研究,特别是翻译语言学的特征分析,旨在提高对翻译文本识别的准确性。
UN Parallel Corpus Annotated for Translation Direction is a dataset developed by the University of Haifa for distinguishing translated texts from source texts. This dataset contains bilingual parallel corpora from English to the other five official United Nations languages: French, Spanish, Russian, Arabic and Chinese, totaling approximately 8.92 million valid sentences. The dataset determines translation direction through a specific directory structure and linked files, and uses machine learning methods for classification. It is primarily used for linguistic research, especially feature analysis in translation linguistics, with the aim of improving the accuracy of translated text identification.
提供机构:
海法大学
创建时间:
2018-05-20



