CATIE-AQ/NanoBEIR-fr
收藏Hugging Face2025-12-10 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/CATIE-AQ/NanoBEIR-fr
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: bm25
features:
- name: query-id
dtype: string
- name: corpus-ids
list: string
splits:
- name: NanoTouche2020
num_bytes: 12088242
num_examples: 49
- name: NanoClimateFEVER
num_bytes: 3922584
num_examples: 50
- name: NanoDBPedia
num_bytes: 10592868
num_examples: 50
- name: NanoFEVER
num_bytes: 5923327
num_examples: 50
- name: NanoFiQA2018
num_bytes: 2241397
num_examples: 50
- name: NanoHotpotQA
num_bytes: 2937600
num_examples: 50
- name: NanoMSMARCO
num_bytes: 2745017
num_examples: 50
- name: NanoNFCorpus
num_bytes: 1765636
num_examples: 50
- name: NanoNQ
num_bytes: 3387630
num_examples: 50
- name: NanoQuoraRetrieval
num_bytes: 2463039
num_examples: 50
- name: NanoSCIDOCS
num_bytes: 4107600
num_examples: 50
- name: NanoArguAna
num_bytes: 7017348
num_examples: 50
- name: NanoSciFact
num_bytes: 1696812
num_examples: 50
download_size: 5119406
dataset_size: 60889100
- config_name: corpus
features:
- name: _id
dtype: string
- name: text
dtype: string
splits:
- name: NanoArguAna
num_bytes: 4548966
num_examples: 3635
- name: NanoClimateFEVER
num_bytes: 6585613
num_examples: 3408
- name: NanoDBPedia
num_bytes: 2530945
num_examples: 6045
- name: NanoFEVER
num_bytes: 7008374
num_examples: 4996
- name: NanoFiQA2018
num_bytes: 5168907
num_examples: 4598
- name: NanoHotpotQA
num_bytes: 2118064
num_examples: 5090
- name: NanoMSMARCO
num_bytes: 2076213
num_examples: 5043
- name: NanoNFCorpus
num_bytes: 5613148
num_examples: 2953
- name: NanoNQ
num_bytes: 3168850
num_examples: 5035
- name: NanoQuoraRetrieval
num_bytes: 430778
num_examples: 5046
- name: NanoSCIDOCS
num_bytes: 2633021
num_examples: 2210
- name: NanoSciFact
num_bytes: 5186190
num_examples: 2919
- name: NanoTouche2020
num_bytes: 15257303
num_examples: 5745
download_size: 35406161
dataset_size: 62326372
- config_name: qrels
features:
- name: query-id
dtype: string
- name: corpus-id
dtype: string
splits:
- name: NanoArguAna
num_bytes: 3496
num_examples: 50
- name: NanoClimateFEVER
num_bytes: 4361
num_examples: 148
- name: NanoDBPedia
num_bytes: 60640
num_examples: 1158
- name: NanoFEVER
num_bytes: 1630
num_examples: 57
- name: NanoFiQA2018
num_bytes: 2200
num_examples: 123
- name: NanoHotpotQA
num_bytes: 3885
num_examples: 100
- name: NanoMSMARCO
num_bytes: 1065
num_examples: 50
- name: NanoNFCorpus
num_bytes: 64851
num_examples: 2518
- name: NanoNQ
num_bytes: 1340
num_examples: 57
- name: NanoQuoraRetrieval
num_bytes: 1359
num_examples: 70
- name: NanoSCIDOCS
num_bytes: 21472
num_examples: 244
- name: NanoSciFact
num_bytes: 1054
num_examples: 56
- name: NanoTouche2020
num_bytes: 45452
num_examples: 932
download_size: 91853
dataset_size: 212805
- config_name: queries
features:
- name: _id
dtype: string
- name: text
dtype: string
splits:
- name: NanoArguAna
num_bytes: 73373
num_examples: 50
- name: NanoClimateFEVER
num_bytes: 8910
num_examples: 50
- name: NanoDBPedia
num_bytes: 2960
num_examples: 50
- name: NanoFEVER
num_bytes: 3244
num_examples: 50
- name: NanoFiQA2018
num_bytes: 4476
num_examples: 50
- name: NanoHotpotQA
num_bytes: 6707
num_examples: 50
- name: NanoMSMARCO
num_bytes: 2806
num_examples: 50
- name: NanoNFCorpus
num_bytes: 2200
num_examples: 50
- name: NanoNQ
num_bytes: 3514
num_examples: 50
- name: NanoQuoraRetrieval
num_bytes: 3903
num_examples: 50
- name: NanoSCIDOCS
num_bytes: 7082
num_examples: 50
- name: NanoSciFact
num_bytes: 6563
num_examples: 50
- name: NanoTouche2020
num_bytes: 3516
num_examples: 49
download_size: 109067
dataset_size: 129254
configs:
- config_name: bm25
data_files:
- split: NanoTouche2020
path: bm25/NanoTouche2020-*
- split: NanoClimateFEVER
path: bm25/NanoClimateFEVER-*
- split: NanoDBPedia
path: bm25/NanoDBPedia-*
- split: NanoFEVER
path: bm25/NanoFEVER-*
- split: NanoFiQA2018
path: bm25/NanoFiQA2018-*
- split: NanoHotpotQA
path: bm25/NanoHotpotQA-*
- split: NanoMSMARCO
path: bm25/NanoMSMARCO-*
- split: NanoNFCorpus
path: bm25/NanoNFCorpus-*
- split: NanoNQ
path: bm25/NanoNQ-*
- split: NanoQuoraRetrieval
path: bm25/NanoQuoraRetrieval-*
- split: NanoSCIDOCS
path: bm25/NanoSCIDOCS-*
- split: NanoArguAna
path: bm25/NanoArguAna-*
- split: NanoSciFact
path: bm25/NanoSciFact-*
- config_name: corpus
data_files:
- split: NanoArguAna
path: corpus/NanoArguAna-*
- split: NanoClimateFEVER
path: corpus/NanoClimateFEVER-*
- split: NanoDBPedia
path: corpus/NanoDBPedia-*
- split: NanoFEVER
path: corpus/NanoFEVER-*
- split: NanoFiQA2018
path: corpus/NanoFiQA2018-*
- split: NanoHotpotQA
path: corpus/NanoHotpotQA-*
- split: NanoMSMARCO
path: corpus/NanoMSMARCO-*
- split: NanoNFCorpus
path: corpus/NanoNFCorpus-*
- split: NanoNQ
path: corpus/NanoNQ-*
- split: NanoQuoraRetrieval
path: corpus/NanoQuoraRetrieval-*
- split: NanoSCIDOCS
path: corpus/NanoSCIDOCS-*
- split: NanoSciFact
path: corpus/NanoSciFact-*
- split: NanoTouche2020
path: corpus/NanoTouche2020-*
- config_name: qrels
data_files:
- split: NanoArguAna
path: qrels/NanoArguAna-*
- split: NanoClimateFEVER
path: qrels/NanoClimateFEVER-*
- split: NanoDBPedia
path: qrels/NanoDBPedia-*
- split: NanoFEVER
path: qrels/NanoFEVER-*
- split: NanoFiQA2018
path: qrels/NanoFiQA2018-*
- split: NanoHotpotQA
path: qrels/NanoHotpotQA-*
- split: NanoMSMARCO
path: qrels/NanoMSMARCO-*
- split: NanoNFCorpus
path: qrels/NanoNFCorpus-*
- split: NanoNQ
path: qrels/NanoNQ-*
- split: NanoQuoraRetrieval
path: qrels/NanoQuoraRetrieval-*
- split: NanoSCIDOCS
path: qrels/NanoSCIDOCS-*
- split: NanoSciFact
path: qrels/NanoSciFact-*
- split: NanoTouche2020
path: qrels/NanoTouche2020-*
- config_name: queries
data_files:
- split: NanoArguAna
path: queries/NanoArguAna-*
- split: NanoClimateFEVER
path: queries/NanoClimateFEVER-*
- split: NanoDBPedia
path: queries/NanoDBPedia-*
- split: NanoFEVER
path: queries/NanoFEVER-*
- split: NanoFiQA2018
path: queries/NanoFiQA2018-*
- split: NanoHotpotQA
path: queries/NanoHotpotQA-*
- split: NanoMSMARCO
path: queries/NanoMSMARCO-*
- split: NanoNFCorpus
path: queries/NanoNFCorpus-*
- split: NanoNQ
path: queries/NanoNQ-*
- split: NanoQuoraRetrieval
path: queries/NanoQuoraRetrieval-*
- split: NanoSCIDOCS
path: queries/NanoSCIDOCS-*
- split: NanoSciFact
path: queries/NanoSciFact-*
- split: NanoTouche2020
path: queries/NanoTouche2020-*
language:
- fra
---
## Description
NanoBEIR translated in French (Microsoft API used).
While in this [collection]( https://huggingface.co/collections/CATIE-AQ/nanobeir-fr) we translate the datasets preserving the original formatting of [zeta-alpha-ai](https://huggingface.co/collections/zeta-alpha-ai/nanobeir), here the format was designed with Tom Aarsen to be compatible with the `Sentence Transformers` library. This dataset contains the 13 datasets composing NanoBEIR.
It should also be noted that we have added a BM25 split (calculated using the `bm25s` library, using the same methodology as Tom Aarsen for the [English dataset](https://huggingface.co/datasets/sentence-transformers/NanoBEIR-en)) so that, in addition to being compatible with the `SparseEncoder` and `SentenceTransformer` models, the dataset is also compatible with `CrossEncoder` (the `CrossEncoderNanoBEIREvaluator` function requires BM25 data).
Finally, if you are interested in languages other than French, we invite you to check out this [collection](https://huggingface.co/collections/sentence-transformers/nanobeir-datasets).
提供机构:
CATIE-AQ



