CoVoSwitch
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/sophiayk20/covoswitch
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个合成数据集,它通过将英文句子中的语调单元替换为来自CoVoST 2数据集的非英文标记而创建,覆盖了13种语言。这样做旨在评估代码切换翻译的性能。该数据集融入了语调单元,以提升翻译表现,并设计用来测试多语言翻译模型在处理代码切换场景时的表现能力。该数据集覆盖了13种资源水平不同(高低)的语言,其任务是机器翻译代码切换文本。
This is a synthetic dataset constructed by replacing prosodic units in English sentences with non-English tokens sourced from the CoVoST 2 dataset, covering 13 languages. It is designed to evaluate the performance of code-switching translation systems. The dataset incorporates prosodic units to improve translation performance, and is specifically developed to test the ability of multilingual translation models to handle code-switching scenarios. It encompasses 13 languages with varying resource levels, ranging from low-resource to high-resource, with its core task being machine translation of code-switching texts.
提供机构:
Authors of the paper



