Annotated SiMT Datasets
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/EurekaForNLP/SimulPL
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是对文本到文本的同声传译任务进行了人工偏好标注的注释数据集。它被用于验证SimulPL框架,并在同声传译任务中提升与人类偏好的对齐度。该数据集包括了一个含有10万条数据的OMT训练数据子集,以及额外的注释数据。其所涉及的任务是同声传译机器翻译(Simt)。
This dataset is an annotated corpus with human preference annotations for the text-to-text simultaneous interpretation task. It is utilized to validate the SimulPL framework and enhance alignment with human preferences in simultaneous interpretation tasks. This dataset includes a 100,000-sample OMT training data subset, as well as additional annotated data. The task covered by this dataset is simultaneous machine translation (Simt).



