Annotated SiMT Datasets

arXiv2025-09-30 收录

下载链接：

https://github.com/EurekaForNLP/SimulPL

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是对文本到文本的同声传译任务进行了人工偏好标注的注释数据集。它被用于验证SimulPL框架，并在同声传译任务中提升与人类偏好的对齐度。该数据集包括了一个含有10万条数据的OMT训练数据子集，以及额外的注释数据。其所涉及的任务是同声传译机器翻译（Simt）。

This dataset is an annotated corpus with human preference annotations for the text-to-text simultaneous interpretation task. It is utilized to validate the SimulPL framework and enhance alignment with human preferences in simultaneous interpretation tasks. This dataset includes a 100,000-sample OMT training data subset, as well as additional annotated data. The task covered by this dataset is simultaneous machine translation (Simt).

5,000+

优质数据集

54 个

任务类型

进入经典数据集