XTREME

Name: XTREME
Creator: 卡内基梅隆大学
Published: 2020-09-05 01:56:31
License: 暂无描述

arXiv2020-09-05 更新2024-06-21 收录

下载链接：

https://sites.research.google/xtreme

下载链接

链接失效反馈

官方服务：

资源简介：

XTREME是由卡内基梅隆大学创建的大规模多语言多任务基准，用于评估跨语言泛化能力。该数据集包含40种语言和9个任务，覆盖12个语言家族，旨在解决语言障碍并推动多语言模型的研究。XTREME特别关注零射击跨语言转移场景，其中训练数据仅提供英语，而模型必须在其他语言中进行测试。数据集的应用领域广泛，包括自然语言推理、问答、词性标注等，旨在通过跨语言学习方法转移语言知识，解决数据稀疏性问题。

XTREME is a large-scale multilingual multi-task benchmark developed by Carnegie Mellon University for evaluating cross-lingual generalization capabilities. This dataset encompasses 40 languages and 9 tasks across 12 language families, with the goal of overcoming language barriers and promoting research on multilingual models. XTREME places particular emphasis on zero-shot cross-lingual transfer scenarios, where only English training data is available, and models are required to be tested on non-English languages. It covers a wide range of application domains including natural language inference, question answering, part-of-speech tagging and more, aiming to transfer linguistic knowledge through cross-lingual learning approaches to mitigate the problem of data sparsity.

提供机构：

卡内基梅隆大学

创建时间：

2020-03-25

5,000+

优质数据集

54 个

任务类型

进入经典数据集