five

MohammadKhodadad/multi-lingual-qac

收藏
Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/MohammadKhodadad/multi-lingual-qac
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: corpus data_files: - split: train path: data/corpus/*.parquet - config_name: queries data_files: - split: train path: data/queries/*.parquet - config_name: qrels data_files: - split: train path: data/qrels/*.parquet - config_name: qac data_files: - split: train path: data/qac/*.parquet --- # Multi-lingual JRC-Acquis QAC ## Overview Question–Answer–Context (QAC) data derived from the JRC-Acquis multilingual legal corpus. ## Dataset Structure - `corpus`: retrieval documents - `queries`: benchmark queries - `qrels`: relevance judgments - `qac`: full question-answer-context rows for inspection and analysis Each config currently contains a `train` split. ## Data Source - **Source dataset:** JRC-Acquis, a multilingual aligned corpus of European Union legal texts. - **This dataset:** The corpus subset, questions, and answers are derived benchmark artifacts built from JRC-Acquis language pairs, where one query is generated from the translated side of a selected pair and linked to both paired documents. - **Note:** Verify the latest upstream distribution terms and citation guidance from the official JRC-Acquis source before public redistribution. <!-- BEGIN MTEB LEADERBOARD --> ## Leaderboard Latest generated benchmark comparison tables are also available under `benchmark_outputs/mteb_tables`. ### Overview - Dataset: `MohammadKhodadad/multi-lingual-qac` - Models compared: `2` - Best model by `ndcg_at_10`: `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` (0.1766) ### Ranking | Rank | Model | Main score | nDCG@10 | MAP@10 | MRR@10 | Hit@10 | Recall@10 | Time (s) | | ---: | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | | 1 | `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` | **0.1766** | **0.1766** | **0.1343** | **0.2005** | **0.3721** | **0.2207** | 388.9 | | 2 | `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` | 0.1430 | 0.1430 | 0.0988 | 0.1929 | 0.3488 | 0.1811 | 121.6 | ### Metric Winners | Metric | Best model | Score | | --- | --- | ---: | | `main_score` | `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` | 0.1766 | | `ndcg_at_10` | `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` | 0.1766 | | `map_at_10` | `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` | 0.1343 | | `mrr_at_10` | `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` | 0.2005 | | `hit_rate_at_10` | `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` | 0.3721 | | `recall_at_10` | `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` | 0.2207 | | `ndcg_at_100` | `sentence-transformers/paraphrase-multilingual-mpnet-base-v2` | 0.2370 | | `hit_rate_at_100` | `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` | 0.5814 | <!-- END MTEB LEADERBOARD -->
提供机构:
MohammadKhodadad
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作