AM2iCo (Adversarial and Multilingual Meaning in Context)

Name: AM2iCo (Adversarial and Multilingual Meaning in Context)
Creator: OpenDataLab
Published: 2026-05-24 07:30:18
License: 暂无描述

OpenDataLab2026-05-24 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/AM2iCo

下载链接

链接失效反馈

官方服务：

资源简介：

在这项工作中，我们展示了 AM2iCo，这是一个广泛覆盖且精心设计的跨语言和多语言评估集；它旨在评估最先进的表示模型在 14 个语言对的上下文中推理跨语言词汇级概念对齐的能力。与 WiC 和 XL-WiC 等相关任务相比，我们的数据集提供了广泛的多种语言和独特词的覆盖范围更高的质量和更高的人类表现（大约 90%，而在 WiC 和 XL-WiC 中为 80%）随着我们介绍更多挑战对抗性示例和更仔细的数据提取策略显式跨语言评估英语 (EN)、德语 (DE)、俄语 (RU)、日语 (JA)、韩语 (KO)、普通话 (ZH)、阿拉伯语(AR)、印度尼西亚语(IN)、芬兰语(FI)、土耳其语(TR) , 巴斯克语(EU), 格鲁吉亚语(KA), 乌尔都语(UR), 孟加拉语(BN), 哈萨克语(KK)。

In this work, we present AM2iCo, a comprehensively covered and meticulously designed cross-lingual and multilingual evaluation benchmark. It is designed to evaluate the capability of state-of-the-art representation models to reason about cross-lingual lexical-level concept alignment within the context of 14 language pairs. Compared with related tasks such as WiC and XL-WiC, our dataset offers broader coverage across more languages and unique lexical items, alongside higher data quality and higher human performance (approximately 90%, compared to 80% in WiC and XL-WiC), as we introduce more challenging adversarial examples and adopt more rigorous data extraction strategies to enable explicit cross-lingual evaluation. The participating languages include: English (EN), German (DE), Russian (RU), Japanese (JA), Korean (KO), Mandarin Chinese (ZH), Arabic (AR), Indonesian (IN), Finnish (FI), Turkish (TR), Basque (EU), Georgian (KA), Urdu (UR), Bengali (BN), and Kazakh (KK).

提供机构：

OpenDataLab

创建时间：

2022-06-23

搜集汇总

数据集介绍