rl-bandits-lab/SEGALE-WMT24
收藏Hugging Face2025-11-05 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/rl-bandits-lab/SEGALE-WMT24
下载链接
链接失效反馈官方服务:
资源简介:
SEGALE-WMT24是一个为SEGALE方案设计的长上下文机器翻译评估数据集,包含三种语言方向和四种合成设置。该数据集用于长上下文MT基准测试和元评估,研究度量标准和系统对欠翻译/过翻译及边界变化的敏感性。数据集包含英德、英西和日中三种语言对,每种语言对都有四种合成设置:drop source(模拟欠翻译)、drop target(模拟过翻译)、merge(模拟句子融合)和raw(基线系统预测)。每个翻译段包含源文本、系统翻译、系统标识符、文档ID和文档内的段索引。
SEGALE-WMT24 is a long-context machine translation evaluation dataset designed for the SEGALE scheme, containing three language directions and four synthetic setups. It is used for long-context MT benchmarking and meta-evaluation, studying the sensitivity of metrics and systems to under/over-translation and boundary variations. The dataset includes English-German, English-Spanish, and Japanese-Chinese language pairs, each with four synthetic setups: drop source (simulating under-translation), drop target (simulating over-translation), merge (simulating sentence fusion), and raw (baseline system predictions). Each translation segment includes source text, system translation, system identifier, document ID, and segment index within the document.
提供机构:
rl-bandits-lab



