伪QE数据集

Name: 伪QE数据集
Creator: 高丽大学计算机科学与工程系
Published: 2021-11-01 16:37:30
License: 暂无描述

arXiv2021-11-01 更新2024-08-06 收录

下载链接：

http://arxiv.org/abs/2111.00767v1

下载链接

链接失效反馈

官方服务：

资源简介：

伪QE数据集是由高丽大学计算机科学与工程系开发的自动生成工具创建的，该工具接收单语或平行语料库作为输入，自动生成用于质量估计（QE）训练的数据集。数据集的创建过程包括用户选择语言对、选择注释级别（单词或句子）以及选择语料库类型（单语或平行）。该数据集的应用领域主要在于提高机器翻译质量估计的性能，特别是在低资源语言对中，通过数据增强和多语言对的利用来解决QE中的数据构建限制问题。

The Pseudo-QE Dataset is created by an automated generation tool developed by the Department of Computer Science and Engineering at Korea University. This tool accepts monolingual or parallel corpora as input, and automatically generates datasets for Quality Estimation (QE) training. The dataset creation process involves users selecting language pairs, annotation levels (word-level or sentence-level), and corpus types (monolingual or parallel). The primary application of this dataset is to improve the performance of machine translation quality estimation, especially for low-resource language pairs, by leveraging data augmentation and multilingual language pairs to address the data construction constraints in QE tasks.

提供机构：

高丽大学计算机科学与工程系

创建时间：

2021-11-01

5,000+

优质数据集

54 个

任务类型

进入经典数据集