MohammadKhodadad/multi-lingual-qac
收藏Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/MohammadKhodadad/multi-lingual-qac
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: corpus
data_files:
- split: train
path: data/corpus/*.parquet
- config_name: queries
data_files:
- split: train
path: data/queries/*.parquet
- config_name: qrels
data_files:
- split: train
path: data/qrels/*.parquet
- config_name: qac
data_files:
- split: train
path: data/qac/*.parquet
---
# Multi-lingual JRC-Acquis QAC
## Overview
Question–Answer–Context (QAC) data derived from the JRC-Acquis multilingual legal corpus.
## Dataset Structure
- `corpus`: retrieval documents
- `queries`: benchmark queries
- `qrels`: relevance judgments
- `qac`: full question-answer-context rows for inspection and analysis
Each config currently contains a `train` split.
## Data Source
- **Source dataset:** JRC-Acquis, a multilingual aligned corpus of European Union legal texts.
- **This dataset:** The corpus subset, questions, and answers are derived benchmark artifacts built from JRC-Acquis language pairs, where one query is generated from the translated side of a selected pair and linked to both paired documents.
- **Note:** Verify the latest upstream distribution terms and citation guidance from the official JRC-Acquis source before public redistribution.
<!-- BEGIN MTEB LEADERBOARD -->
## Leaderboard
Latest generated benchmark comparison tables are also available under `benchmark_outputs/mteb_tables`.
### Overview
- Dataset: `MohammadKhodadad/multi-lingual-qac`
- Models compared: `2`
- Best model by `ndcg_at_10`: `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` (0.1766)
### Ranking
| Rank | Model | Main score | nDCG@10 | MAP@10 | MRR@10 | Hit@10 | Recall@10 | Time (s) |
| ---: | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| 1 | `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` | **0.1766** | **0.1766** | **0.1343** | **0.2005** | **0.3721** | **0.2207** | 388.9 |
| 2 | `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` | 0.1430 | 0.1430 | 0.0988 | 0.1929 | 0.3488 | 0.1811 | 121.6 |
### Metric Winners
| Metric | Best model | Score |
| --- | --- | ---: |
| `main_score` | `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` | 0.1766 |
| `ndcg_at_10` | `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` | 0.1766 |
| `map_at_10` | `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` | 0.1343 |
| `mrr_at_10` | `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` | 0.2005 |
| `hit_rate_at_10` | `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` | 0.3721 |
| `recall_at_10` | `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` | 0.2207 |
| `ndcg_at_100` | `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` | 0.2370 |
| `hit_rate_at_100` | `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` | 0.5814 |
<!-- END MTEB LEADERBOARD -->
提供机构:
MohammadKhodadad



