WeTS
收藏arXiv2022-10-11 更新2024-06-21 收录
下载链接:
https://github.com/ZhenYangIACAS/WeTS.git
下载链接
链接失效反馈官方服务:
资源简介:
WeTS是由微信人工智能模式识别中心创建的一个用于翻译建议的基准数据集,涵盖英语到德语、德语到英语、中文到英语和英语到中文四个翻译方向。该数据集由专家翻译者标注,旨在解决现有研究中缺乏公开可用黄金数据集的问题,以推动翻译建议领域的研究。数据集包含63716条记录,通过人工标注和合成语料库的构建,提高了翻译建议的性能。WeTS不仅提供了一个高质量的基准,还通过多种方法构建了合成语料库,以支持模型的预训练和性能提升。该数据集的应用领域主要集中在机器翻译后的编辑工作,旨在减少翻译者的认知负担和编辑时间。
WeTS is a benchmark dataset for translation suggestion developed by the WeChat Artificial Intelligence Pattern Recognition Center. It encompasses four translation directions: English to German, German to English, Chinese to English, and English to Chinese. Annotated by professional translators, this dataset is designed to address the scarcity of publicly available gold-standard datasets in existing research, thereby advancing scholarly work in the translation suggestion domain. The dataset comprises 63,716 records, and is constructed through manual annotation and synthetic corpus development to enhance the performance of translation suggestion tasks. Beyond serving as a high-quality benchmark, WeTS builds synthetic corpora via multiple approaches to support model pre-training and performance optimization. Its primary application scenarios lie in post-editing work following machine translation, with the goal of reducing translators' cognitive load and editing time.
提供机构:
微信人工智能模式识别中心
创建时间:
2021-10-11



