NAMAA-Space/Ar-Reranking-Eval

Name: NAMAA-Space/Ar-Reranking-Eval
Creator: NAMAA-Space
Published: 2024-11-01 18:13:03
License: 暂无描述

Hugging Face2024-11-01 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/NAMAA-Space/Ar-Reranking-Eval

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: query dtype: string - name: candidate_document dtype: string - name: relevance_label dtype: int64 splits: - name: train num_bytes: 214773 num_examples: 468 download_size: 80551 dataset_size: 214773 configs: - config_name: default data_files: - split: train path: data/train-* license: apache-2.0 language: - ar pretty_name: A --- # Arabic Reranking Evaluation Dataset ## Dataset Overview This dataset, containing 468 rows, is curated for evaluating reranking and retrieval models in Arabic. It covers various topics such as artificial intelligence, machine learning, data analysis, technology, education, and more, with diverse query complexities and document lengths. The dataset is intended to aid in developing and benchmarking Arabic language models that rank information based on relevance. ## Dataset Structure Each entry in the dataset is structured as follows: - Query as`string`: A natural language query in Arabic, representing user intent across multiple domains. - - Candidate Document as `string`: A potential answer or document related to the query. Documents vary in length, complexity, and relevance to the query. - - Relevance Label `binary`: A label indicating whether the candidate document is relevant (`1`) or irrelevant (`0`) to the query. Each query includes **2-3 relevant** and **2-3 irrelevant** documents to ensure balanced training and evaluation. ## Example Structure ``` { "query": "ما هي تطبيقات الذكاء الاصطناعي في المجالات المختلفة؟", "candidate_document": "الذكاء الاصطناعي يستخدم في تحسين الإنتاجية في الصناعات.", "relevance_label": 1 } ``` ## Key Statistics 🔸 Total Entries: 468 🔸 Unique Queries: 100+ 🔸 Relevant Documents: ~234 🔸 Irrelevant Documents: ~234 🔸 Topics: Artificial Intelligence, Data Analysis, Education, Healthcare, and General Knowledge ## Usage and Applications ▪️ We aim to use this dataset for evaluating Arabic reranking models that rank documents by relevance. ## Evaluation Metrics The dataset can be evaluated using common ranking metrics: | Metric | Description | |---------------------------------|---------------------------------------------------------------------------------------------| | **Mean Reciprocal Rank (MRR)** | Evaluates the rank position of the first relevant document. | | **Mean Average Precision (MAP)**| Assesses average precision across multiple relevant documents. | | **nDCG (Normalized Discounted Cumulative Gain)** | Measures relevance at various ranks, taking graded relevance into account. | | **Precision@K and Recall@K** | Measures precision and recall within the top-K ranked documents. | ## Limitations and Considerations Binary Relevance: The dataset uses binary labels (1 for relevant, 0 for irrelevant), which may not fully capture nuanced relevance levels. Domain Representation: While the dataset covers diverse topics, it may not represent every possible domain in Arabic content.

提供机构：

NAMAA-Space

5,000+

优质数据集

54 个

任务类型

进入经典数据集