karynaur/xpr_colbert
收藏Hugging Face2024-02-08 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/karynaur/xpr_colbert
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: ar
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 186925115.8424133
num_examples: 315257
- name: test
num_bytes: 80111272.15758668
num_examples: 135111
download_size: 168705698
dataset_size: 267036388.0
- config_name: de
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 72613285.59577411
num_examples: 144211
- name: test
num_bytes: 31120123.404225886
num_examples: 61805
download_size: 77074916
dataset_size: 103733409.0
- config_name: es
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 44871811.2738276
num_examples: 99590
- name: test
num_bytes: 19231033.726172403
num_examples: 42682
download_size: 46570628
dataset_size: 64102845.0
- config_name: fr
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 41916606.86744066
num_examples: 98156
- name: test
num_bytes: 17964748.132559333
num_examples: 42068
download_size: 43654468
dataset_size: 59881355.0
- config_name: ja
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 609278382.3
num_examples: 1100960
- name: test
num_bytes: 261119306.7
num_examples: 471840
download_size: 625161112
dataset_size: 870397689.0
- config_name: ko
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 77171323.53466085
num_examples: 159667
- name: test
num_bytes: 33073562.465339154
num_examples: 68429
download_size: 80643421
dataset_size: 110244886.0
- config_name: ro
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 6360407.250186012
num_examples: 15052
- name: test
num_bytes: 2726371.749813988
num_examples: 6452
download_size: 6335805
dataset_size: 9086779.0
- config_name: ru
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 243178333.3
num_examples: 390432
- name: test
num_bytes: 104219285.7
num_examples: 167328
download_size: 225381017
dataset_size: 347397619.0
- config_name: zh
features:
- name: query
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 311415676.235992
num_examples: 621644
- name: test
num_bytes: 133464433.764008
num_examples: 266420
download_size: 353784547
dataset_size: 444880110.0
configs:
- config_name: ar
data_files:
- split: train
path: ar/train-*
- split: test
path: ar/test-*
- config_name: de
data_files:
- split: train
path: de/train-*
- split: test
path: de/test-*
- config_name: es
data_files:
- split: train
path: es/train-*
- split: test
path: es/test-*
- config_name: fr
data_files:
- split: train
path: fr/train-*
- split: test
path: fr/test-*
- config_name: ja
data_files:
- split: train
path: ja/train-*
- split: test
path: ja/test-*
- config_name: ko
data_files:
- split: train
path: ko/train-*
- split: test
path: ko/test-*
- config_name: ro
data_files:
- split: train
path: ro/train-*
- split: test
path: ro/test-*
- config_name: ru
data_files:
- split: train
path: ru/train-*
- split: test
path: ru/test-*
- config_name: zh
data_files:
- split: train
path: zh/train-*
- split: test
path: zh/test-*
---
提供机构:
karynaur
原始信息汇总
数据集概述
数据集配置
阿拉伯语 (ar)
- 特征:
- query: string
- positive: string
- negative: string
- 分割:
- train:
- 字节数: 186925115.8424133
- 样本数: 315257
- test:
- 字节数: 80111272.15758668
- 样本数: 135111
- train:
- 下载大小: 168705698
- 数据集大小: 267036388.0
德语 (de)
- 特征:
- query: string
- positive: string
- negative: string
- 分割:
- train:
- 字节数: 72613285.59577411
- 样本数: 144211
- test:
- 字节数: 31120123.404225886
- 样本数: 61805
- train:
- 下载大小: 77074916
- 数据集大小: 103733409.0
西班牙语 (es)
- 特征:
- query: string
- positive: string
- negative: string
- 分割:
- train:
- 字节数: 44871811.2738276
- 样本数: 99590
- test:
- 字节数: 19231033.726172403
- 样本数: 42682
- train:
- 下载大小: 46570628
- 数据集大小: 64102845.0
法语 (fr)
- 特征:
- query: string
- positive: string
- negative: string
- 分割:
- train:
- 字节数: 41916606.86744066
- 样本数: 98156
- test:
- 字节数: 17964748.132559333
- 样本数: 42068
- train:
- 下载大小: 43654468
- 数据集大小: 59881355.0
日语 (ja)
- 特征:
- query: string
- positive: string
- negative: string
- 分割:
- train:
- 字节数: 609278382.3
- 样本数: 1100960
- test:
- 字节数: 261119306.7
- 样本数: 471840
- train:
- 下载大小: 625161112
- 数据集大小: 870397689.0
韩语 (ko)
- 特征:
- query: string
- positive: string
- negative: string
- 分割:
- train:
- 字节数: 77171323.53466085
- 样本数: 159667
- test:
- 字节数: 33073562.465339154
- 样本数: 68429
- train:
- 下载大小: 80643421
- 数据集大小: 110244886.0
罗马尼亚语 (ro)
- 特征:
- query: string
- positive: string
- negative: string
- 分割:
- train:
- 字节数: 6360407.250186012
- 样本数: 15052
- test:
- 字节数: 2726371.749813988
- 样本数: 6452
- train:
- 下载大小: 6335805
- 数据集大小: 9086779.0
俄语 (ru)
- 特征:
- query: string
- positive: string
- negative: string
- 分割:
- train:
- 字节数: 243178333.3
- 样本数: 390432
- test:
- 字节数: 104219285.7
- 样本数: 167328
- train:
- 下载大小: 225381017
- 数据集大小: 347397619.0
中文 (zh)
- 特征:
- query: string
- positive: string
- negative: string
- 分割:
- train:
- 字节数: 311415676.235992
- 样本数: 621644
- test:
- 字节数: 133464433.764008
- 样本数: 266420
- train:
- 下载大小: 353784547
- 数据集大小: 444880110.0
数据文件路径
- 阿拉伯语 (ar):
- train: ar/train-*
- test: ar/test-*
- 德语 (de):
- train: de/train-*
- test: de/test-*
- 西班牙语 (es):
- train: es/train-*
- test: es/test-*
- 法语 (fr):
- train: fr/train-*
- test: fr/test-*
- 日语 (ja):
- train: ja/train-*
- test: ja/test-*
- 韩语 (ko):
- train: ko/train-*
- test: ko/test-*
- 罗马尼亚语 (ro):
- train: ro/train-*
- test: ro/test-*
- 俄语 (ru):
- train: ru/train-*
- test: ru/test-*
- 中文 (zh):
- train: zh/train-*
- test: zh/test-*



