Open Arabic LLM Leaderboard (OALL)
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/spaces/OALL/Open-Arabic-LLM-Leaderboard
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是对阿拉伯语大型语言模型进行评估的基准,它包括三个主要基准测试:AlGhafa、ACVA 和阿拉伯语 MMLU,同时还包含了标准大型语言模型基准测试的翻译版本。该基准测试评估了模型在不同领域的文本补全能力、逻辑正确性以及事实知识,然而,它也存在一些局限性,比如覆盖范围较窄和评价标准过于简化。该任务旨在对阿拉伯语语言模型进行评估。
This dataset is a benchmark for evaluating Arabic Large Language Models (LLMs). It includes three core benchmark tasks: AlGhafa, ACVA, and Arabic MMLU, as well as translated versions of standard LLM benchmarks. This benchmark assesses models' text completion capabilities across diverse domains, logical correctness, and factual knowledge. However, it has several limitations such as narrow coverage and overly simplified evaluation criteria. This benchmark aims to evaluate Arabic language models.
提供机构:
OALL



