ORCA

Name: ORCA
Creator: 深度学习和自然语言处理组，不列颠哥伦比亚大学
Published: 2023-05-30 02:27:37
License: 暂无描述

arXiv2023-05-30 更新2024-06-21 收录

下载链接：

https://orca.dlnlp.ai/

下载链接

链接失效反馈

官方服务：

资源简介：

ORCA数据集是由不列颠哥伦比亚大学的深度学习和自然语言处理组创建的，旨在为阿拉伯语言理解提供一个挑战性和多样化的评估基准。该数据集包含60个公开可用的数据集，这些数据集被组织成七个任务集群，包括句子分类、结构预测、主题分类、语义文本相似性、自然语言推理、问答和词义消歧。ORCA数据集的设计考虑到了阿拉伯语言的丰富性和多样性，包括现代标准阿拉伯语和方言阿拉伯语。此外，数据集还提供了详细的统计分析和模型评估，以及一个交互式的公共排行榜，以促进阿拉伯和多语言自然语言处理领域的进步。

The ORCA dataset was developed by the Deep Learning and Natural Language Processing Group at the University of British Columbia, with the aim of providing a challenging and diverse evaluation benchmark for Arabic language understanding. It consists of 60 publicly available datasets, which are organized into seven task clusters: sentence classification, structure prediction, topic classification, semantic textual similarity, natural language inference, question answering, and word sense disambiguation. The design of the ORCA dataset takes into account the richness and diversity of the Arabic language, including both Modern Standard Arabic and colloquial Arabic varieties. In addition, the dataset provides detailed statistical analyses and model evaluation results, as well as an interactive public leaderboard to advance progress in the fields of Arabic and multilingual natural language processing.

提供机构：

深度学习和自然语言处理组，不列颠哥伦比亚大学

创建时间：

2022-12-21

5,000+

优质数据集

54 个

任务类型

进入经典数据集