XTREME
收藏arXiv2020-09-05 更新2024-06-21 收录
下载链接:
https://sites.research.google/xtreme
下载链接
链接失效反馈官方服务:
资源简介:
XTREME是由卡内基梅隆大学创建的大规模多语言多任务基准,用于评估跨语言泛化能力。该数据集包含40种语言和9个任务,覆盖12个语言家族,旨在解决语言障碍并推动多语言模型的研究。XTREME特别关注零射击跨语言转移场景,其中训练数据仅提供英语,而模型必须在其他语言中进行测试。数据集的应用领域广泛,包括自然语言推理、问答、词性标注等,旨在通过跨语言学习方法转移语言知识,解决数据稀疏性问题。
XTREME is a large-scale multilingual multi-task benchmark developed by Carnegie Mellon University for evaluating cross-lingual generalization capabilities. This dataset encompasses 40 languages and 9 tasks across 12 language families, with the goal of overcoming language barriers and promoting research on multilingual models. XTREME places particular emphasis on zero-shot cross-lingual transfer scenarios, where only English training data is available, and models are required to be tested on non-English languages. It covers a wide range of application domains including natural language inference, question answering, part-of-speech tagging and more, aiming to transfer linguistic knowledge through cross-lingual learning approaches to mitigate the problem of data sparsity.
提供机构:
卡内基梅隆大学
创建时间:
2020-03-25



