five

tomaarsen/NanoBEIR-fr-copy

收藏
Hugging Face2025-12-10 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/tomaarsen/NanoBEIR-fr-copy
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: bm25 features: - name: query-id dtype: string - name: corpus-ids list: string splits: - name: NanoTouche2020 num_bytes: 12088242 num_examples: 49 - name: NanoClimateFEVER num_bytes: 3922584 num_examples: 50 - name: NanoDBPedia num_bytes: 10592868 num_examples: 50 - name: NanoFEVER num_bytes: 5923327 num_examples: 50 - name: NanoFiQA2018 num_bytes: 2241397 num_examples: 50 - name: NanoHotpotQA num_bytes: 2937600 num_examples: 50 - name: NanoMSMARCO num_bytes: 2745017 num_examples: 50 - name: NanoNFCorpus num_bytes: 1765636 num_examples: 50 - name: NanoNQ num_bytes: 3387630 num_examples: 50 - name: NanoQuoraRetrieval num_bytes: 2463039 num_examples: 50 - name: NanoSCIDOCS num_bytes: 4107600 num_examples: 50 - name: NanoArguAna num_bytes: 7017348 num_examples: 50 - name: NanoSciFact num_bytes: 1696812 num_examples: 50 download_size: 5119406 dataset_size: 60889100 - config_name: corpus features: - name: _id dtype: string - name: text dtype: string splits: - name: NanoArguAna num_bytes: 4592586 num_examples: 3635 - name: NanoClimateFEVER num_bytes: 6643549 num_examples: 3408 - name: NanoDBPedia num_bytes: 2603485 num_examples: 6045 - name: NanoFEVER num_bytes: 7058334 num_examples: 4996 - name: NanoFiQA2018 num_bytes: 5228681 num_examples: 4598 - name: NanoHotpotQA num_bytes: 2184234 num_examples: 5090 - name: NanoMSMARCO num_bytes: 2136729 num_examples: 5043 - name: NanoNFCorpus num_bytes: 5651537 num_examples: 2953 - name: NanoNQ num_bytes: 3204095 num_examples: 5035 - name: NanoQuoraRetrieval num_bytes: 526652 num_examples: 5046 - name: NanoSCIDOCS num_bytes: 2659541 num_examples: 2210 - name: NanoSciFact num_bytes: 5221218 num_examples: 2919 - name: NanoTouche2020 num_bytes: 15343478 num_examples: 5745 download_size: 35452929 dataset_size: 63054119 - config_name: qrels features: - name: query-id dtype: string - name: corpus-id dtype: string splits: - name: NanoArguAna num_bytes: 4696 num_examples: 50 - name: NanoClimateFEVER num_bytes: 9393 num_examples: 148 - name: NanoDBPedia num_bytes: 88432 num_examples: 1158 - name: NanoFEVER num_bytes: 2770 num_examples: 57 - name: NanoFiQA2018 num_bytes: 5398 num_examples: 123 - name: NanoHotpotQA num_bytes: 6485 num_examples: 100 - name: NanoMSMARCO num_bytes: 2265 num_examples: 50 - name: NanoNFCorpus num_bytes: 130319 num_examples: 2518 - name: NanoNQ num_bytes: 2138 num_examples: 57 - name: NanoQuoraRetrieval num_bytes: 4019 num_examples: 70 - name: NanoSCIDOCS num_bytes: 27328 num_examples: 244 - name: NanoSciFact num_bytes: 2398 num_examples: 56 - name: NanoTouche2020 num_bytes: 73412 num_examples: 932 download_size: 96655 dataset_size: 359053 - config_name: queries features: - name: _id dtype: string - name: text dtype: string splits: - name: NanoArguAna num_bytes: 73973 num_examples: 50 - name: NanoClimateFEVER num_bytes: 9760 num_examples: 50 - name: NanoDBPedia num_bytes: 3560 num_examples: 50 - name: NanoFEVER num_bytes: 3744 num_examples: 50 - name: NanoFiQA2018 num_bytes: 5126 num_examples: 50 - name: NanoHotpotQA num_bytes: 7357 num_examples: 50 - name: NanoMSMARCO num_bytes: 3406 num_examples: 50 - name: NanoNFCorpus num_bytes: 2850 num_examples: 50 - name: NanoNQ num_bytes: 3864 num_examples: 50 - name: NanoQuoraRetrieval num_bytes: 4853 num_examples: 50 - name: NanoSCIDOCS num_bytes: 7682 num_examples: 50 - name: NanoSciFact num_bytes: 7163 num_examples: 50 - name: NanoTouche2020 num_bytes: 4251 num_examples: 49 download_size: 110341 dataset_size: 137589 configs: - config_name: bm25 data_files: - split: NanoClimateFEVER path: bm25/NanoClimateFEVER-* - split: NanoFEVER path: bm25/NanoFEVER-* - split: NanoFiQA2018 path: bm25/NanoFiQA2018-* - split: NanoMSMARCO path: bm25/NanoMSMARCO-* - split: NanoTouche2020 path: bm25/NanoTouche2020-* - split: NanoDBPedia path: bm25/NanoDBPedia-* - split: NanoHotpotQA path: bm25/NanoHotpotQA-* - split: NanoNFCorpus path: bm25/NanoNFCorpus-* - split: NanoNQ path: bm25/NanoNQ-* - split: NanoQuoraRetrieval path: bm25/NanoQuoraRetrieval-* - split: NanoSCIDOCS path: bm25/NanoSCIDOCS-* - split: NanoArguAna path: bm25/NanoArguAna-* - split: NanoSciFact path: bm25/NanoSciFact-* - config_name: corpus data_files: - split: NanoArguAna path: corpus/NanoArguAna-* - split: NanoClimateFEVER path: corpus/NanoClimateFEVER-* - split: NanoDBPedia path: corpus/NanoDBPedia-* - split: NanoFEVER path: corpus/NanoFEVER-* - split: NanoFiQA2018 path: corpus/NanoFiQA2018-* - split: NanoHotpotQA path: corpus/NanoHotpotQA-* - split: NanoMSMARCO path: corpus/NanoMSMARCO-* - split: NanoNFCorpus path: corpus/NanoNFCorpus-* - split: NanoNQ path: corpus/NanoNQ-* - split: NanoQuoraRetrieval path: corpus/NanoQuoraRetrieval-* - split: NanoSCIDOCS path: corpus/NanoSCIDOCS-* - split: NanoSciFact path: corpus/NanoSciFact-* - split: NanoTouche2020 path: corpus/NanoTouche2020-* - config_name: qrels data_files: - split: NanoArguAna path: qrels/NanoArguAna-* - split: NanoClimateFEVER path: qrels/NanoClimateFEVER-* - split: NanoDBPedia path: qrels/NanoDBPedia-* - split: NanoFEVER path: qrels/NanoFEVER-* - split: NanoFiQA2018 path: qrels/NanoFiQA2018-* - split: NanoHotpotQA path: qrels/NanoHotpotQA-* - split: NanoMSMARCO path: qrels/NanoMSMARCO-* - split: NanoNFCorpus path: qrels/NanoNFCorpus-* - split: NanoNQ path: qrels/NanoNQ-* - split: NanoQuoraRetrieval path: qrels/NanoQuoraRetrieval-* - split: NanoSCIDOCS path: qrels/NanoSCIDOCS-* - split: NanoSciFact path: qrels/NanoSciFact-* - split: NanoTouche2020 path: qrels/NanoTouche2020-* - config_name: queries data_files: - split: NanoArguAna path: queries/NanoArguAna-* - split: NanoClimateFEVER path: queries/NanoClimateFEVER-* - split: NanoDBPedia path: queries/NanoDBPedia-* - split: NanoFEVER path: queries/NanoFEVER-* - split: NanoFiQA2018 path: queries/NanoFiQA2018-* - split: NanoHotpotQA path: queries/NanoHotpotQA-* - split: NanoMSMARCO path: queries/NanoMSMARCO-* - split: NanoNFCorpus path: queries/NanoNFCorpus-* - split: NanoNQ path: queries/NanoNQ-* - split: NanoQuoraRetrieval path: queries/NanoQuoraRetrieval-* - split: NanoSCIDOCS path: queries/NanoSCIDOCS-* - split: NanoSciFact path: queries/NanoSciFact-* - split: NanoTouche2020 path: queries/NanoTouche2020-* language: - fra --- ## Description NanoBEIR translated in French (Microsoft API used). While in this [collection]( https://huggingface.co/collections/CATIE-AQ/nanobeir-fr) we translate the datasets preserving the original formatting of [zeta-alpha-ai](https://huggingface.co/collections/zeta-alpha-ai/nanobeir), here the format was designed with Tom Aarsen to be compatible with the `Sentence Transformers` library. This dataset contains the 13 datasets composing NanoBEIR. It should also be noted that we have added a BM25 split (calculated using the `bm25s` library, using the same methodology as Tom Aarsen for the [English dataset](https://huggingface.co/datasets/sentence-transformers/NanoBEIR-en)) so that, in addition to being compatible with the `SparseEncoder` and `SentenceTransformer` models, the dataset is also compatible with `CrossEncoder` (the `CrossEncoderNanoBEIREvaluator` function requires BM25 data). Finally, if you are interested in languages other than French, we invite you to check out this [collection](https://huggingface.co/collections/sentence-transformers/nanobeir-datasets).
提供机构:
tomaarsen
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作