tomaarsen/NanoBEIR-fr-copy
收藏Hugging Face2025-12-10 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/tomaarsen/NanoBEIR-fr-copy
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: bm25
features:
- name: query-id
dtype: string
- name: corpus-ids
list: string
splits:
- name: NanoTouche2020
num_bytes: 12088242
num_examples: 49
- name: NanoClimateFEVER
num_bytes: 3922584
num_examples: 50
- name: NanoDBPedia
num_bytes: 10592868
num_examples: 50
- name: NanoFEVER
num_bytes: 5923327
num_examples: 50
- name: NanoFiQA2018
num_bytes: 2241397
num_examples: 50
- name: NanoHotpotQA
num_bytes: 2937600
num_examples: 50
- name: NanoMSMARCO
num_bytes: 2745017
num_examples: 50
- name: NanoNFCorpus
num_bytes: 1765636
num_examples: 50
- name: NanoNQ
num_bytes: 3387630
num_examples: 50
- name: NanoQuoraRetrieval
num_bytes: 2463039
num_examples: 50
- name: NanoSCIDOCS
num_bytes: 4107600
num_examples: 50
- name: NanoArguAna
num_bytes: 7017348
num_examples: 50
- name: NanoSciFact
num_bytes: 1696812
num_examples: 50
download_size: 5119406
dataset_size: 60889100
- config_name: corpus
features:
- name: _id
dtype: string
- name: text
dtype: string
splits:
- name: NanoArguAna
num_bytes: 4592586
num_examples: 3635
- name: NanoClimateFEVER
num_bytes: 6643549
num_examples: 3408
- name: NanoDBPedia
num_bytes: 2603485
num_examples: 6045
- name: NanoFEVER
num_bytes: 7058334
num_examples: 4996
- name: NanoFiQA2018
num_bytes: 5228681
num_examples: 4598
- name: NanoHotpotQA
num_bytes: 2184234
num_examples: 5090
- name: NanoMSMARCO
num_bytes: 2136729
num_examples: 5043
- name: NanoNFCorpus
num_bytes: 5651537
num_examples: 2953
- name: NanoNQ
num_bytes: 3204095
num_examples: 5035
- name: NanoQuoraRetrieval
num_bytes: 526652
num_examples: 5046
- name: NanoSCIDOCS
num_bytes: 2659541
num_examples: 2210
- name: NanoSciFact
num_bytes: 5221218
num_examples: 2919
- name: NanoTouche2020
num_bytes: 15343478
num_examples: 5745
download_size: 35452929
dataset_size: 63054119
- config_name: qrels
features:
- name: query-id
dtype: string
- name: corpus-id
dtype: string
splits:
- name: NanoArguAna
num_bytes: 4696
num_examples: 50
- name: NanoClimateFEVER
num_bytes: 9393
num_examples: 148
- name: NanoDBPedia
num_bytes: 88432
num_examples: 1158
- name: NanoFEVER
num_bytes: 2770
num_examples: 57
- name: NanoFiQA2018
num_bytes: 5398
num_examples: 123
- name: NanoHotpotQA
num_bytes: 6485
num_examples: 100
- name: NanoMSMARCO
num_bytes: 2265
num_examples: 50
- name: NanoNFCorpus
num_bytes: 130319
num_examples: 2518
- name: NanoNQ
num_bytes: 2138
num_examples: 57
- name: NanoQuoraRetrieval
num_bytes: 4019
num_examples: 70
- name: NanoSCIDOCS
num_bytes: 27328
num_examples: 244
- name: NanoSciFact
num_bytes: 2398
num_examples: 56
- name: NanoTouche2020
num_bytes: 73412
num_examples: 932
download_size: 96655
dataset_size: 359053
- config_name: queries
features:
- name: _id
dtype: string
- name: text
dtype: string
splits:
- name: NanoArguAna
num_bytes: 73973
num_examples: 50
- name: NanoClimateFEVER
num_bytes: 9760
num_examples: 50
- name: NanoDBPedia
num_bytes: 3560
num_examples: 50
- name: NanoFEVER
num_bytes: 3744
num_examples: 50
- name: NanoFiQA2018
num_bytes: 5126
num_examples: 50
- name: NanoHotpotQA
num_bytes: 7357
num_examples: 50
- name: NanoMSMARCO
num_bytes: 3406
num_examples: 50
- name: NanoNFCorpus
num_bytes: 2850
num_examples: 50
- name: NanoNQ
num_bytes: 3864
num_examples: 50
- name: NanoQuoraRetrieval
num_bytes: 4853
num_examples: 50
- name: NanoSCIDOCS
num_bytes: 7682
num_examples: 50
- name: NanoSciFact
num_bytes: 7163
num_examples: 50
- name: NanoTouche2020
num_bytes: 4251
num_examples: 49
download_size: 110341
dataset_size: 137589
configs:
- config_name: bm25
data_files:
- split: NanoClimateFEVER
path: bm25/NanoClimateFEVER-*
- split: NanoFEVER
path: bm25/NanoFEVER-*
- split: NanoFiQA2018
path: bm25/NanoFiQA2018-*
- split: NanoMSMARCO
path: bm25/NanoMSMARCO-*
- split: NanoTouche2020
path: bm25/NanoTouche2020-*
- split: NanoDBPedia
path: bm25/NanoDBPedia-*
- split: NanoHotpotQA
path: bm25/NanoHotpotQA-*
- split: NanoNFCorpus
path: bm25/NanoNFCorpus-*
- split: NanoNQ
path: bm25/NanoNQ-*
- split: NanoQuoraRetrieval
path: bm25/NanoQuoraRetrieval-*
- split: NanoSCIDOCS
path: bm25/NanoSCIDOCS-*
- split: NanoArguAna
path: bm25/NanoArguAna-*
- split: NanoSciFact
path: bm25/NanoSciFact-*
- config_name: corpus
data_files:
- split: NanoArguAna
path: corpus/NanoArguAna-*
- split: NanoClimateFEVER
path: corpus/NanoClimateFEVER-*
- split: NanoDBPedia
path: corpus/NanoDBPedia-*
- split: NanoFEVER
path: corpus/NanoFEVER-*
- split: NanoFiQA2018
path: corpus/NanoFiQA2018-*
- split: NanoHotpotQA
path: corpus/NanoHotpotQA-*
- split: NanoMSMARCO
path: corpus/NanoMSMARCO-*
- split: NanoNFCorpus
path: corpus/NanoNFCorpus-*
- split: NanoNQ
path: corpus/NanoNQ-*
- split: NanoQuoraRetrieval
path: corpus/NanoQuoraRetrieval-*
- split: NanoSCIDOCS
path: corpus/NanoSCIDOCS-*
- split: NanoSciFact
path: corpus/NanoSciFact-*
- split: NanoTouche2020
path: corpus/NanoTouche2020-*
- config_name: qrels
data_files:
- split: NanoArguAna
path: qrels/NanoArguAna-*
- split: NanoClimateFEVER
path: qrels/NanoClimateFEVER-*
- split: NanoDBPedia
path: qrels/NanoDBPedia-*
- split: NanoFEVER
path: qrels/NanoFEVER-*
- split: NanoFiQA2018
path: qrels/NanoFiQA2018-*
- split: NanoHotpotQA
path: qrels/NanoHotpotQA-*
- split: NanoMSMARCO
path: qrels/NanoMSMARCO-*
- split: NanoNFCorpus
path: qrels/NanoNFCorpus-*
- split: NanoNQ
path: qrels/NanoNQ-*
- split: NanoQuoraRetrieval
path: qrels/NanoQuoraRetrieval-*
- split: NanoSCIDOCS
path: qrels/NanoSCIDOCS-*
- split: NanoSciFact
path: qrels/NanoSciFact-*
- split: NanoTouche2020
path: qrels/NanoTouche2020-*
- config_name: queries
data_files:
- split: NanoArguAna
path: queries/NanoArguAna-*
- split: NanoClimateFEVER
path: queries/NanoClimateFEVER-*
- split: NanoDBPedia
path: queries/NanoDBPedia-*
- split: NanoFEVER
path: queries/NanoFEVER-*
- split: NanoFiQA2018
path: queries/NanoFiQA2018-*
- split: NanoHotpotQA
path: queries/NanoHotpotQA-*
- split: NanoMSMARCO
path: queries/NanoMSMARCO-*
- split: NanoNFCorpus
path: queries/NanoNFCorpus-*
- split: NanoNQ
path: queries/NanoNQ-*
- split: NanoQuoraRetrieval
path: queries/NanoQuoraRetrieval-*
- split: NanoSCIDOCS
path: queries/NanoSCIDOCS-*
- split: NanoSciFact
path: queries/NanoSciFact-*
- split: NanoTouche2020
path: queries/NanoTouche2020-*
language:
- fra
---
## Description
NanoBEIR translated in French (Microsoft API used).
While in this [collection]( https://huggingface.co/collections/CATIE-AQ/nanobeir-fr) we translate the datasets preserving the original formatting of [zeta-alpha-ai](https://huggingface.co/collections/zeta-alpha-ai/nanobeir), here the format was designed with Tom Aarsen to be compatible with the `Sentence Transformers` library. This dataset contains the 13 datasets composing NanoBEIR.
It should also be noted that we have added a BM25 split (calculated using the `bm25s` library, using the same methodology as Tom Aarsen for the [English dataset](https://huggingface.co/datasets/sentence-transformers/NanoBEIR-en)) so that, in addition to being compatible with the `SparseEncoder` and `SentenceTransformer` models, the dataset is also compatible with `CrossEncoder` (the `CrossEncoderNanoBEIREvaluator` function requires BM25 data).
Finally, if you are interested in languages other than French, we invite you to check out this [collection](https://huggingface.co/collections/sentence-transformers/nanobeir-datasets).
提供机构:
tomaarsen



