tomaarsen/NanoBEIR-ja
收藏Hugging Face2025-12-10 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/tomaarsen/NanoBEIR-ja
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: corpus
features:
- name: _id
dtype: string
- name: text
dtype: string
splits:
- name: NanoClimateFEVER
num_bytes: 6537653
num_examples: 3408
- name: NanoDBPedia
num_bytes: 3125718
num_examples: 6045
- name: NanoFEVER
num_bytes: 8198841
num_examples: 4996
- name: NanoFiQA2018
num_bytes: 5677326
num_examples: 4598
- name: NanoHotpotQA
num_bytes: 2652164
num_examples: 5090
- name: NanoMSMARCO
num_bytes: 2203454
num_examples: 5043
- name: NanoNFCorpus
num_bytes: 5296700
num_examples: 2953
- name: NanoNQ
num_bytes: 3539146
num_examples: 5035
- name: NanoQuoraRetrieval
num_bytes: 536226
num_examples: 5046
- name: NanoSCIDOCS
num_bytes: 2599191
num_examples: 2210
- name: NanoArguAna
num_bytes: 4725257
num_examples: 3635
- name: NanoSciFact
num_bytes: 5036450
num_examples: 2919
- name: NanoTouche2020
num_bytes: 15382584
num_examples: 5745
download_size: 35840420
dataset_size: 65510710
- config_name: qrels
features:
- name: query-id
dtype: string
- name: corpus-id
dtype: string
splits:
- name: NanoClimateFEVER
num_bytes: 4361
num_examples: 148
- name: NanoDBPedia
num_bytes: 60640
num_examples: 1158
- name: NanoFEVER
num_bytes: 1630
num_examples: 57
- name: NanoFiQA2018
num_bytes: 2200
num_examples: 123
- name: NanoHotpotQA
num_bytes: 3885
num_examples: 100
- name: NanoMSMARCO
num_bytes: 1065
num_examples: 50
- name: NanoNFCorpus
num_bytes: 64851
num_examples: 2518
- name: NanoNQ
num_bytes: 1340
num_examples: 57
- name: NanoQuoraRetrieval
num_bytes: 1359
num_examples: 70
- name: NanoSCIDOCS
num_bytes: 21472
num_examples: 244
- name: NanoArguAna
num_bytes: 3496
num_examples: 50
- name: NanoSciFact
num_bytes: 1054
num_examples: 56
- name: NanoTouche2020
num_bytes: 45452
num_examples: 932
download_size: 91853
dataset_size: 212805
- config_name: queries
features:
- name: _id
dtype: string
- name: text
dtype: string
splits:
- name: NanoClimateFEVER
num_bytes: 8888
num_examples: 50
- name: NanoDBPedia
num_bytes: 4411
num_examples: 50
- name: NanoFEVER
num_bytes: 4533
num_examples: 50
- name: NanoFiQA2018
num_bytes: 4775
num_examples: 50
- name: NanoHotpotQA
num_bytes: 8060
num_examples: 50
- name: NanoMSMARCO
num_bytes: 4492
num_examples: 50
- name: NanoNFCorpus
num_bytes: 2447
num_examples: 50
- name: NanoNQ
num_bytes: 6622
num_examples: 50
- name: NanoQuoraRetrieval
num_bytes: 4630
num_examples: 50
- name: NanoSCIDOCS
num_bytes: 6684
num_examples: 50
- name: NanoArguAna
num_bytes: 77515
num_examples: 50
- name: NanoSciFact
num_bytes: 6273
num_examples: 50
- name: NanoTouche2020
num_bytes: 3644
num_examples: 49
download_size: 116820
dataset_size: 142974
configs:
- config_name: corpus
data_files:
- split: NanoClimateFEVER
path: corpus/NanoClimateFEVER-*
- split: NanoDBPedia
path: corpus/NanoDBPedia-*
- split: NanoFEVER
path: corpus/NanoFEVER-*
- split: NanoFiQA2018
path: corpus/NanoFiQA2018-*
- split: NanoHotpotQA
path: corpus/NanoHotpotQA-*
- split: NanoMSMARCO
path: corpus/NanoMSMARCO-*
- split: NanoNFCorpus
path: corpus/NanoNFCorpus-*
- split: NanoNQ
path: corpus/NanoNQ-*
- split: NanoQuoraRetrieval
path: corpus/NanoQuoraRetrieval-*
- split: NanoSCIDOCS
path: corpus/NanoSCIDOCS-*
- split: NanoArguAna
path: corpus/NanoArguAna-*
- split: NanoSciFact
path: corpus/NanoSciFact-*
- split: NanoTouche2020
path: corpus/NanoTouche2020-*
- config_name: qrels
data_files:
- split: NanoClimateFEVER
path: qrels/NanoClimateFEVER-*
- split: NanoDBPedia
path: qrels/NanoDBPedia-*
- split: NanoFEVER
path: qrels/NanoFEVER-*
- split: NanoFiQA2018
path: qrels/NanoFiQA2018-*
- split: NanoHotpotQA
path: qrels/NanoHotpotQA-*
- split: NanoMSMARCO
path: qrels/NanoMSMARCO-*
- split: NanoNFCorpus
path: qrels/NanoNFCorpus-*
- split: NanoNQ
path: qrels/NanoNQ-*
- split: NanoQuoraRetrieval
path: qrels/NanoQuoraRetrieval-*
- split: NanoSCIDOCS
path: qrels/NanoSCIDOCS-*
- split: NanoArguAna
path: qrels/NanoArguAna-*
- split: NanoSciFact
path: qrels/NanoSciFact-*
- split: NanoTouche2020
path: qrels/NanoTouche2020-*
- config_name: queries
data_files:
- split: NanoClimateFEVER
path: queries/NanoClimateFEVER-*
- split: NanoDBPedia
path: queries/NanoDBPedia-*
- split: NanoFEVER
path: queries/NanoFEVER-*
- split: NanoFiQA2018
path: queries/NanoFiQA2018-*
- split: NanoHotpotQA
path: queries/NanoHotpotQA-*
- split: NanoMSMARCO
path: queries/NanoMSMARCO-*
- split: NanoNFCorpus
path: queries/NanoNFCorpus-*
- split: NanoNQ
path: queries/NanoNQ-*
- split: NanoQuoraRetrieval
path: queries/NanoQuoraRetrieval-*
- split: NanoSCIDOCS
path: queries/NanoSCIDOCS-*
- split: NanoArguAna
path: queries/NanoArguAna-*
- split: NanoSciFact
path: queries/NanoSciFact-*
- split: NanoTouche2020
path: queries/NanoTouche2020-*
default: true
---
数据集信息:
- 配置名称:语料库(corpus)
特征字段:
- 名称:_id,数据类型:字符串
- 名称:text,数据类型:字符串
划分集:
- 名称:NanoClimateFEVER,字节大小:6537653,样本数量:3408
- 名称:NanoDBPedia,字节大小:3125718,样本数量:6045
- 名称:NanoFEVER,字节大小:8198841,样本数量:4996
- 名称:NanoFiQA2018,字节大小:5677326,样本数量:4598
- 名称:NanoHotpotQA,字节大小:2652164,样本数量:5090
- 名称:NanoMSMARCO,字节大小:2203454,样本数量:5043
- 名称:NanoNFCorpus,字节大小:5296700,样本数量:2953
- 名称:NanoNQ,字节大小:3539146,样本数量:5035
- 名称:NanoQuoraRetrieval,字节大小:536226,样本数量:5046
- 名称:NanoSCIDOCS,字节大小:2599191,样本数量:2210
- 名称:NanoArguAna,字节大小:4725257,样本数量:3635
- 名称:NanoSciFact,字节大小:5036450,样本数量:2919
- 名称:NanoTouche2020,字节大小:15382584,样本数量:5745
下载总大小:35840420 字节
数据集总大小:65510710 字节
- 配置名称:相关性标注集(qrels)
特征字段:
- 名称:查询ID(query-id),数据类型:字符串
- 名称:语料库ID(corpus-id),数据类型:字符串
划分集:
- 名称:NanoClimateFEVER,字节大小:4361,样本数量:148
- 名称:NanoDBPedia,字节大小:60640,样本数量:1158
- 名称:NanoFEVER,字节大小:1630,样本数量:57
- 名称:NanoFiQA2018,字节大小:2200,样本数量:123
- 名称:NanoHotpotQA,字节大小:3885,样本数量:100
- 名称:NanoMSMARCO,字节大小:1065,样本数量:50
- 名称:NanoNFCorpus,字节大小:64851,样本数量:2518
- 名称:NanoNQ,字节大小:1340,样本数量:57
- 名称:NanoQuoraRetrieval,字节大小:1359,样本数量:70
- 名称:NanoSCIDOCS,字节大小:21472,样本数量:244
- 名称:NanoArguAna,字节大小:3496,样本数量:50
- 名称:NanoSciFact,字节大小:1054,样本数量:56
- 名称:NanoTouche2020,字节大小:45452,样本数量:932
下载总大小:91853 字节
数据集总大小:212805 字节
- 配置名称:查询集(queries)
特征字段:
- 名称:_id,数据类型:字符串
- 名称:text,数据类型:字符串
划分集:
- 名称:NanoClimateFEVER,字节大小:8888,样本数量:50
- 名称:NanoDBPedia,字节大小:4411,样本数量:50
- 名称:NanoFEVER,字节大小:4533,样本数量:50
- 名称:NanoFiQA2018,字节大小:4775,样本数量:50
- 名称:NanoHotpotQA,字节大小:8060,样本数量:50
- 名称:NanoMSMARCO,字节大小:4492,样本数量:50
- 名称:NanoNFCorpus,字节大小:2447,样本数量:50
- 名称:NanoNQ,字节大小:6622,样本数量:50
- 名称:NanoQuoraRetrieval,字节大小:4630,样本数量:50
- 名称:NanoSCIDOCS,字节大小:6684,样本数量:50
- 名称:NanoArguAna,字节大小:77515,样本数量:50
- 名称:NanoSciFact,字节大小:6273,样本数量:50
- 名称:NanoTouche2020,字节大小:3644,样本数量:49
下载总大小:116820 字节
数据集总大小:142974 字节
配置项:
- 配置名称:corpus
数据文件:
- 划分集:NanoClimateFEVER,路径:corpus/NanoClimateFEVER-*
- 划分集:NanoDBPedia,路径:corpus/NanoDBPedia-*
- 划分集:NanoFEVER,路径:corpus/NanoFEVER-*
- 划分集:NanoFiQA2018,路径:corpus/NanoFiQA2018-*
- 划分集:NanoHotpotQA,路径:corpus/NanoHotpotQA-*
- 划分集:NanoMSMARCO,路径:corpus/NanoMSMARCO-*
- 划分集:NanoNFCorpus,路径:corpus/NanoNFCorpus-*
- 划分集:NanoNQ,路径:corpus/NanoNQ-*
- 划分集:NanoQuoraRetrieval,路径:corpus/NanoQuoraRetrieval-*
- 划分集:NanoSCIDOCS,路径:corpus/NanoSCIDOCS-*
- 划分集:NanoArguAna,路径:corpus/NanoArguAna-*
- 划分集:NanoSciFact,路径:corpus/NanoSciFact-*
- 划分集:NanoTouche2020,路径:corpus/NanoTouche2020-*
- 配置名称:qrels
数据文件:
- 划分集:NanoClimateFEVER,路径:qrels/NanoClimateFEVER-*
- 划分集:NanoDBPedia,路径:qrels/NanoDBPedia-*
- 划分集:NanoFEVER,路径:qrels/NanoFEVER-*
- 划分集:NanoFiQA2018,路径:qrels/NanoFiQA2018-*
- 划分集:NanoHotpotQA,路径:qrels/NanoHotpotQA-*
- 划分集:NanoMSMARCO,路径:qrels/NanoMSMARCO-*
- 划分集:NanoNFCorpus,路径:qrels/NanoNFCorpus-*
- 划分集:NanoNQ,路径:qrels/NanoNQ-*
- 划分集:NanoQuoraRetrieval,路径:qrels/NanoQuoraRetrieval-*
- 划分集:NanoSCIDOCS,路径:qrels/NanoSCIDOCS-*
- 划分集:NanoArguAna,路径:qrels/NanoArguAna-*
- 划分集:NanoSciFact,路径:qrels/NanoSciFact-*
- 划分集:NanoTouche2020,路径:qrels/NanoTouche2020-*
- 配置名称:queries
数据文件:
- 划分集:NanoClimateFEVER,路径:queries/NanoClimateFEVER-*
- 划分集:NanoDBPedia,路径:queries/NanoDBPedia-*
- 划分集:NanoFEVER,路径:queries/NanoFEVER-*
- 划分集:NanoFiQA2018,路径:queries/NanoFiQA2018-*
- 划分集:NanoHotpotQA,路径:queries/NanoHotpotQA-*
- 划分集:NanoMSMARCO,路径:queries/NanoMSMARCO-*
- 划分集:NanoNFCorpus,路径:queries/NanoNFCorpus-*
- 划分集:NanoNQ,路径:queries/NanoNQ-*
- 划分集:NanoQuoraRetrieval,路径:queries/NanoQuoraRetrieval-*
- 划分集:NanoSCIDOCS,路径:queries/NanoSCIDOCS-*
- 划分集:NanoArguAna,路径:queries/NanoArguAna-*
- 划分集:NanoSciFact,路径:queries/NanoSciFact-*
- 划分集:NanoTouche2020,路径:queries/NanoTouche2020-*
默认配置:启用
提供机构:
tomaarsen



