silma-ai/silma-rag-qa-benchmark-v1.0

Name: silma-ai/silma-rag-qa-benchmark-v1.0
Creator: silma-ai
Published: 2025-06-11 12:55:21
License: 暂无描述

Hugging Face2025-06-11 更新2024-12-14 收录

下载链接：

https://hf-mirror.com/datasets/silma-ai/silma-rag-qa-benchmark-v1.0

下载链接

链接失效反馈

官方服务：

资源简介：

SILMA RAGQA是一个由silma.ai创建的数据集和基准测试，用于评估阿拉伯语言模型在抽取式问答任务中的有效性，特别是针对RAG应用。该基准测试包含17个阿拉伯语和英语的双语数据集，涵盖多个领域。数据集用于评估即将发布的SILMA Kashif模型。基准测试的能力包括处理短长上下文、提供短长答案、回答复杂数值问题、基于表格数据回答问题、多跳问答、负面拒绝、多领域问答和噪声鲁棒性等。

The SILMA RAGQA Benchmark Dataset V1.0 is a dataset and benchmark designed to assess the effectiveness of Arabic Language Models in Extractive Question Answering tasks, with a specific emphasis on RAG applications. The benchmark includes 17 bilingual datasets in Arabic and English, spanning various domains such as legal, medical, finance, and biology. The benchmark tests several capabilities including general QA, handling short and long contexts, providing short and long answers, answering complex numerical questions, handling tabular data, multi-hop question answering, negative rejection, multi-domain capabilities, and noise robustness. The data sources are listed with their respective languages, sizes, and links. The benchmark also includes a script for evaluating any model against the benchmark, utilizing metrics such as Exact Match, BLEU, ROUGE, and BERTScore. The README also mentions future work and provides contact information for feedback.

提供机构：

silma-ai

5,000+

优质数据集

54 个

任务类型

进入经典数据集