five

CATIE-AQ/NanoBEIR-fr

收藏
Hugging Face2025-12-10 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/CATIE-AQ/NanoBEIR-fr
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: bm25 features: - name: query-id dtype: string - name: corpus-ids list: string splits: - name: NanoTouche2020 num_bytes: 12088242 num_examples: 49 - name: NanoClimateFEVER num_bytes: 3922584 num_examples: 50 - name: NanoDBPedia num_bytes: 10592868 num_examples: 50 - name: NanoFEVER num_bytes: 5923327 num_examples: 50 - name: NanoFiQA2018 num_bytes: 2241397 num_examples: 50 - name: NanoHotpotQA num_bytes: 2937600 num_examples: 50 - name: NanoMSMARCO num_bytes: 2745017 num_examples: 50 - name: NanoNFCorpus num_bytes: 1765636 num_examples: 50 - name: NanoNQ num_bytes: 3387630 num_examples: 50 - name: NanoQuoraRetrieval num_bytes: 2463039 num_examples: 50 - name: NanoSCIDOCS num_bytes: 4107600 num_examples: 50 - name: NanoArguAna num_bytes: 7017348 num_examples: 50 - name: NanoSciFact num_bytes: 1696812 num_examples: 50 download_size: 5119406 dataset_size: 60889100 - config_name: corpus features: - name: _id dtype: string - name: text dtype: string splits: - name: NanoArguAna num_bytes: 4548966 num_examples: 3635 - name: NanoClimateFEVER num_bytes: 6585613 num_examples: 3408 - name: NanoDBPedia num_bytes: 2530945 num_examples: 6045 - name: NanoFEVER num_bytes: 7008374 num_examples: 4996 - name: NanoFiQA2018 num_bytes: 5168907 num_examples: 4598 - name: NanoHotpotQA num_bytes: 2118064 num_examples: 5090 - name: NanoMSMARCO num_bytes: 2076213 num_examples: 5043 - name: NanoNFCorpus num_bytes: 5613148 num_examples: 2953 - name: NanoNQ num_bytes: 3168850 num_examples: 5035 - name: NanoQuoraRetrieval num_bytes: 430778 num_examples: 5046 - name: NanoSCIDOCS num_bytes: 2633021 num_examples: 2210 - name: NanoSciFact num_bytes: 5186190 num_examples: 2919 - name: NanoTouche2020 num_bytes: 15257303 num_examples: 5745 download_size: 35406161 dataset_size: 62326372 - config_name: qrels features: - name: query-id dtype: string - name: corpus-id dtype: string splits: - name: NanoArguAna num_bytes: 3496 num_examples: 50 - name: NanoClimateFEVER num_bytes: 4361 num_examples: 148 - name: NanoDBPedia num_bytes: 60640 num_examples: 1158 - name: NanoFEVER num_bytes: 1630 num_examples: 57 - name: NanoFiQA2018 num_bytes: 2200 num_examples: 123 - name: NanoHotpotQA num_bytes: 3885 num_examples: 100 - name: NanoMSMARCO num_bytes: 1065 num_examples: 50 - name: NanoNFCorpus num_bytes: 64851 num_examples: 2518 - name: NanoNQ num_bytes: 1340 num_examples: 57 - name: NanoQuoraRetrieval num_bytes: 1359 num_examples: 70 - name: NanoSCIDOCS num_bytes: 21472 num_examples: 244 - name: NanoSciFact num_bytes: 1054 num_examples: 56 - name: NanoTouche2020 num_bytes: 45452 num_examples: 932 download_size: 91853 dataset_size: 212805 - config_name: queries features: - name: _id dtype: string - name: text dtype: string splits: - name: NanoArguAna num_bytes: 73373 num_examples: 50 - name: NanoClimateFEVER num_bytes: 8910 num_examples: 50 - name: NanoDBPedia num_bytes: 2960 num_examples: 50 - name: NanoFEVER num_bytes: 3244 num_examples: 50 - name: NanoFiQA2018 num_bytes: 4476 num_examples: 50 - name: NanoHotpotQA num_bytes: 6707 num_examples: 50 - name: NanoMSMARCO num_bytes: 2806 num_examples: 50 - name: NanoNFCorpus num_bytes: 2200 num_examples: 50 - name: NanoNQ num_bytes: 3514 num_examples: 50 - name: NanoQuoraRetrieval num_bytes: 3903 num_examples: 50 - name: NanoSCIDOCS num_bytes: 7082 num_examples: 50 - name: NanoSciFact num_bytes: 6563 num_examples: 50 - name: NanoTouche2020 num_bytes: 3516 num_examples: 49 download_size: 109067 dataset_size: 129254 configs: - config_name: bm25 data_files: - split: NanoTouche2020 path: bm25/NanoTouche2020-* - split: NanoClimateFEVER path: bm25/NanoClimateFEVER-* - split: NanoDBPedia path: bm25/NanoDBPedia-* - split: NanoFEVER path: bm25/NanoFEVER-* - split: NanoFiQA2018 path: bm25/NanoFiQA2018-* - split: NanoHotpotQA path: bm25/NanoHotpotQA-* - split: NanoMSMARCO path: bm25/NanoMSMARCO-* - split: NanoNFCorpus path: bm25/NanoNFCorpus-* - split: NanoNQ path: bm25/NanoNQ-* - split: NanoQuoraRetrieval path: bm25/NanoQuoraRetrieval-* - split: NanoSCIDOCS path: bm25/NanoSCIDOCS-* - split: NanoArguAna path: bm25/NanoArguAna-* - split: NanoSciFact path: bm25/NanoSciFact-* - config_name: corpus data_files: - split: NanoArguAna path: corpus/NanoArguAna-* - split: NanoClimateFEVER path: corpus/NanoClimateFEVER-* - split: NanoDBPedia path: corpus/NanoDBPedia-* - split: NanoFEVER path: corpus/NanoFEVER-* - split: NanoFiQA2018 path: corpus/NanoFiQA2018-* - split: NanoHotpotQA path: corpus/NanoHotpotQA-* - split: NanoMSMARCO path: corpus/NanoMSMARCO-* - split: NanoNFCorpus path: corpus/NanoNFCorpus-* - split: NanoNQ path: corpus/NanoNQ-* - split: NanoQuoraRetrieval path: corpus/NanoQuoraRetrieval-* - split: NanoSCIDOCS path: corpus/NanoSCIDOCS-* - split: NanoSciFact path: corpus/NanoSciFact-* - split: NanoTouche2020 path: corpus/NanoTouche2020-* - config_name: qrels data_files: - split: NanoArguAna path: qrels/NanoArguAna-* - split: NanoClimateFEVER path: qrels/NanoClimateFEVER-* - split: NanoDBPedia path: qrels/NanoDBPedia-* - split: NanoFEVER path: qrels/NanoFEVER-* - split: NanoFiQA2018 path: qrels/NanoFiQA2018-* - split: NanoHotpotQA path: qrels/NanoHotpotQA-* - split: NanoMSMARCO path: qrels/NanoMSMARCO-* - split: NanoNFCorpus path: qrels/NanoNFCorpus-* - split: NanoNQ path: qrels/NanoNQ-* - split: NanoQuoraRetrieval path: qrels/NanoQuoraRetrieval-* - split: NanoSCIDOCS path: qrels/NanoSCIDOCS-* - split: NanoSciFact path: qrels/NanoSciFact-* - split: NanoTouche2020 path: qrels/NanoTouche2020-* - config_name: queries data_files: - split: NanoArguAna path: queries/NanoArguAna-* - split: NanoClimateFEVER path: queries/NanoClimateFEVER-* - split: NanoDBPedia path: queries/NanoDBPedia-* - split: NanoFEVER path: queries/NanoFEVER-* - split: NanoFiQA2018 path: queries/NanoFiQA2018-* - split: NanoHotpotQA path: queries/NanoHotpotQA-* - split: NanoMSMARCO path: queries/NanoMSMARCO-* - split: NanoNFCorpus path: queries/NanoNFCorpus-* - split: NanoNQ path: queries/NanoNQ-* - split: NanoQuoraRetrieval path: queries/NanoQuoraRetrieval-* - split: NanoSCIDOCS path: queries/NanoSCIDOCS-* - split: NanoSciFact path: queries/NanoSciFact-* - split: NanoTouche2020 path: queries/NanoTouche2020-* language: - fra --- ## Description NanoBEIR translated in French (Microsoft API used). While in this [collection]( https://huggingface.co/collections/CATIE-AQ/nanobeir-fr) we translate the datasets preserving the original formatting of [zeta-alpha-ai](https://huggingface.co/collections/zeta-alpha-ai/nanobeir), here the format was designed with Tom Aarsen to be compatible with the `Sentence Transformers` library. This dataset contains the 13 datasets composing NanoBEIR. It should also be noted that we have added a BM25 split (calculated using the `bm25s` library, using the same methodology as Tom Aarsen for the [English dataset](https://huggingface.co/datasets/sentence-transformers/NanoBEIR-en)) so that, in addition to being compatible with the `SparseEncoder` and `SentenceTransformer` models, the dataset is also compatible with `CrossEncoder` (the `CrossEncoderNanoBEIREvaluator` function requires BM25 data). Finally, if you are interested in languages other than French, we invite you to check out this [collection](https://huggingface.co/collections/sentence-transformers/nanobeir-datasets).
提供机构:
CATIE-AQ
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作