silma-ai/arabic-broad-benchmark
收藏Hugging Face2025-05-23 更新2025-05-31 收录
下载链接:
https://hf-mirror.com/datasets/silma-ai/arabic-broad-benchmark
下载链接
链接失效反馈官方服务:
资源简介:
阿拉伯广泛基准(ABB)数据集是一个由SILMA.AI创建的先进基准数据集,旨在评估大型语言模型在阿拉伯语中的性能。它包含470个高质量的人工验证问题,这些问题从64个阿拉伯语基准数据集中抽取,涵盖了22个类别和技能。该数据集用于问答、翻译、摘要和表格问答等任务。评估过程结合了手动规则和基于LLM的评分方法。该数据集用于ABL排行榜,对阿拉伯语模型进行排名。README还提供了使用数据集进行模型基准测试的说明,包括依赖项、设置和示例输出。此外,它还概述了评分规则、数据类别、分布和来源,提供了对数据集组成和使用方法的全面理解。
The Arabic Broad Benchmark (ABB) is a comprehensive dataset created by SILMA.AI to evaluate the performance of large language models in Arabic. It includes 470 high-quality, human-validated questions sampled from 64 Arabic benchmark datasets, covering 22 categories and skills. The dataset is used for tasks such as question-answering, translation, summarization, and table-question-answering. The evaluation process combines manual rules and LLM-based scoring methods. The dataset is utilized in the ABL Leaderboard for ranking Arabic language models. The README provides instructions for using the dataset to benchmark models, including dependencies, setup, and an example output. Additionally, it outlines the scoring rules, data categories, distribution, and sources, offering a thorough understanding of the datasets composition and usage.
提供机构:
silma-ai



