nthakur/xtreme-up-retrieval-cross-lang
收藏Hugging Face2024-04-27 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/nthakur/xtreme-up-retrieval-cross-lang
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: as
features:
- name: _id
dtype: string
- name: query
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: language
dtype: string
- name: code
dtype: string
splits:
- name: train
num_bytes: 186950
num_examples: 227
- name: validation
num_bytes: 188857
num_examples: 260
- name: test
num_bytes: 440015
num_examples: 537
download_size: 1074315
dataset_size: 815822
- config_name: bho
features:
- name: _id
dtype: string
- name: query
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: language
dtype: string
- name: code
dtype: string
splits:
- name: train
num_bytes: 184034
num_examples: 230
- name: validation
num_bytes: 182929
num_examples: 262
- name: test
num_bytes: 426805
num_examples: 535
download_size: 944336
dataset_size: 793768
- config_name: brx
features:
- name: _id
dtype: string
- name: query
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: language
dtype: string
- name: code
dtype: string
splits:
- name: train
num_bytes: 185784
num_examples: 230
- name: validation
num_bytes: 185331
num_examples: 261
- name: test
num_bytes: 433560
num_examples: 537
download_size: 1070850
dataset_size: 804675
- config_name: gbm
features:
- name: _id
dtype: string
- name: query
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: language
dtype: string
- name: code
dtype: string
splits:
- name: train
num_bytes: 185208
num_examples: 230
- name: validation
num_bytes: 184119
num_examples: 261
- name: test
num_bytes: 426897
num_examples: 538
download_size: 946654
dataset_size: 796224
- config_name: gom
features:
- name: _id
dtype: string
- name: query
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: language
dtype: string
- name: code
dtype: string
splits:
- name: train
num_bytes: 194506
num_examples: 235
- name: validation
num_bytes: 192119
num_examples: 269
- name: test
num_bytes: 432531
num_examples: 533
download_size: 964906
dataset_size: 819156
- config_name: gu
features:
- name: _id
dtype: string
- name: query
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: language
dtype: string
- name: code
dtype: string
splits:
- name: train
num_bytes: 194290
num_examples: 241
- name: validation
num_bytes: 193151
num_examples: 275
- name: test
num_bytes: 426459
num_examples: 534
download_size: 971538
dataset_size: 813900
- config_name: hi
features:
- name: _id
dtype: string
- name: query
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: language
dtype: string
- name: code
dtype: string
splits:
- name: train
num_bytes: 219153
num_examples: 278
- name: validation
num_bytes: 192465
num_examples: 276
- name: test
num_bytes: 429235
num_examples: 539
download_size: 1003624
dataset_size: 840853
- config_name: hne
features:
- name: _id
dtype: string
- name: query
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: language
dtype: string
- name: code
dtype: string
splits:
- name: train
num_bytes: 221342
num_examples: 277
- name: validation
num_bytes: 196040
num_examples: 276
- name: test
num_bytes: 430852
num_examples: 533
download_size: 1005200
dataset_size: 848234
- config_name: kn
features:
- name: _id
dtype: string
- name: query
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: language
dtype: string
- name: code
dtype: string
splits:
- name: train
num_bytes: 221147
num_examples: 278
- name: validation
num_bytes: 196488
num_examples: 277
- name: test
num_bytes: 428691
num_examples: 535
download_size: 1007264
dataset_size: 846326
- config_name: mai
features:
- name: _id
dtype: string
- name: query
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: language
dtype: string
- name: code
dtype: string
splits:
- name: train
num_bytes: 223440
num_examples: 276
- name: validation
num_bytes: 196049
num_examples: 273
- name: test
num_bytes: 431545
num_examples: 539
download_size: 1005560
dataset_size: 851034
- config_name: ml
features:
- name: _id
dtype: string
- name: query
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: language
dtype: string
- name: code
dtype: string
splits:
- name: train
num_bytes: 225396
num_examples: 278
- name: validation
num_bytes: 200574
num_examples: 277
- name: test
num_bytes: 436795
num_examples: 533
download_size: 1014162
dataset_size: 862765
- config_name: mni
features:
- name: _id
dtype: string
- name: query
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: language
dtype: string
- name: code
dtype: string
splits:
- name: train
num_bytes: 225672
num_examples: 278
- name: validation
num_bytes: 206226
num_examples: 284
- name: test
num_bytes: 440315
num_examples: 532
download_size: 1909577
dataset_size: 872213
- config_name: mr
features:
- name: _id
dtype: string
- name: query
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: language
dtype: string
- name: code
dtype: string
splits:
- name: train
num_bytes: 219673
num_examples: 276
- name: validation
num_bytes: 202734
num_examples: 289
- name: test
num_bytes: 429963
num_examples: 537
download_size: 1016040
dataset_size: 852370
- config_name: mwr
features:
- name: _id
dtype: string
- name: query
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: language
dtype: string
- name: code
dtype: string
splits:
- name: train
num_bytes: 217962
num_examples: 279
- name: validation
num_bytes: 200319
num_examples: 290
- name: test
num_bytes: 424328
num_examples: 536
download_size: 1012502
dataset_size: 842609
- config_name: or
features:
- name: _id
dtype: string
- name: query
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: language
dtype: string
- name: code
dtype: string
splits:
- name: train
num_bytes: 221936
num_examples: 278
- name: validation
num_bytes: 206278
num_examples: 290
- name: test
num_bytes: 422930
num_examples: 536
download_size: 1017246
dataset_size: 851144
- config_name: pa
features:
- name: _id
dtype: string
- name: query
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: language
dtype: string
- name: code
dtype: string
splits:
- name: train
num_bytes: 217525
num_examples: 277
- name: validation
num_bytes: 201150
num_examples: 289
- name: test
num_bytes: 418441
num_examples: 528
download_size: 1003902
dataset_size: 837116
- config_name: ps
features:
- name: _id
dtype: string
- name: query
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: language
dtype: string
- name: code
dtype: string
splits:
- name: train
num_bytes: 206474
num_examples: 277
- name: validation
num_bytes: 188937
num_examples: 288
- name: test
num_bytes: 404831
num_examples: 537
download_size: 993728
dataset_size: 800242
- config_name: sa
features:
- name: _id
dtype: string
- name: query
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: language
dtype: string
- name: code
dtype: string
splits:
- name: train
num_bytes: 221586
num_examples: 278
- name: validation
num_bytes: 205293
num_examples: 289
- name: test
num_bytes: 433570
num_examples: 534
download_size: 1018634
dataset_size: 860449
- config_name: ta
features:
- name: _id
dtype: string
- name: query
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: language
dtype: string
- name: code
dtype: string
splits:
- name: train
num_bytes: 223414
num_examples: 278
- name: validation
num_bytes: 206155
num_examples: 288
- name: test
num_bytes: 434283
num_examples: 534
download_size: 1017250
dataset_size: 863852
- config_name: ur
features:
- name: _id
dtype: string
- name: query
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: language
dtype: string
- name: code
dtype: string
splits:
- name: train
num_bytes: 206610
num_examples: 279
- name: validation
num_bytes: 189744
num_examples: 291
- name: test
num_bytes: 404685
num_examples: 538
download_size: 997046
dataset_size: 801039
configs:
- config_name: as
data_files:
- split: train
path: as/train-*
- split: validation
path: as/validation-*
- split: test
path: as/test-*
- config_name: bho
data_files:
- split: train
path: bho/train-*
- split: validation
path: bho/validation-*
- split: test
path: bho/test-*
- config_name: brx
data_files:
- split: train
path: brx/train-*
- split: validation
path: brx/validation-*
- split: test
path: brx/test-*
- config_name: gbm
data_files:
- split: train
path: gbm/train-*
- split: validation
path: gbm/validation-*
- split: test
path: gbm/test-*
- config_name: gom
data_files:
- split: train
path: gom/train-*
- split: validation
path: gom/validation-*
- split: test
path: gom/test-*
- config_name: gu
data_files:
- split: train
path: gu/train-*
- split: validation
path: gu/validation-*
- split: test
path: gu/test-*
- config_name: hi
data_files:
- split: train
path: hi/train-*
- split: validation
path: hi/validation-*
- split: test
path: hi/test-*
- config_name: hne
data_files:
- split: train
path: hne/train-*
- split: validation
path: hne/validation-*
- split: test
path: hne/test-*
- config_name: kn
data_files:
- split: train
path: kn/train-*
- split: validation
path: kn/validation-*
- split: test
path: kn/test-*
- config_name: mai
data_files:
- split: train
path: mai/train-*
- split: validation
path: mai/validation-*
- split: test
path: mai/test-*
- config_name: ml
data_files:
- split: train
path: ml/train-*
- split: validation
path: ml/validation-*
- split: test
path: ml/test-*
- config_name: mni
data_files:
- split: train
path: mni/train-*
- split: validation
path: mni/validation-*
- split: test
path: mni/test-*
- config_name: mr
data_files:
- split: train
path: mr/train-*
- split: validation
path: mr/validation-*
- split: test
path: mr/test-*
- config_name: mwr
data_files:
- split: train
path: mwr/train-*
- split: validation
path: mwr/validation-*
- split: test
path: mwr/test-*
- config_name: or
data_files:
- split: train
path: or/train-*
- split: validation
path: or/validation-*
- split: test
path: or/test-*
- config_name: pa
data_files:
- split: train
path: pa/train-*
- split: validation
path: pa/validation-*
- split: test
path: pa/test-*
- config_name: ps
data_files:
- split: train
path: ps/train-*
- split: validation
path: ps/validation-*
- split: test
path: ps/test-*
- config_name: sa
data_files:
- split: train
path: sa/train-*
- split: validation
path: sa/validation-*
- split: test
path: sa/test-*
- config_name: ta
data_files:
- split: train
path: ta/train-*
- split: validation
path: ta/validation-*
- split: test
path: ta/test-*
- config_name: ur
data_files:
- split: train
path: ur/train-*
- split: validation
path: ur/validation-*
- split: test
path: ur/test-*
---
提供机构:
nthakur
原始信息汇总
数据集概述
数据集配置 as
- 特征:
_id: 字符串query: 字符串title: 字符串text: 字符串language: 字符串code: 字符串
- 分割:
train: 227个样本,186950字节validation: 260个样本,188857字节test: 537个样本,440015字节
- 下载大小: 1074315字节
- 数据集大小: 815822字节
数据集配置 bho
- 特征:
_id: 字符串query: 字符串title: 字符串text: 字符串language: 字符串code: 字符串
- 分割:
train: 230个样本,184034字节validation: 262个样本,182929字节test: 535个样本,426805字节
- 下载大小: 944336字节
- 数据集大小: 793768字节
数据集配置 brx
- 特征:
_id: 字符串query: 字符串title: 字符串text: 字符串language: 字符串code: 字符串
- 分割:
train: 230个样本,185784字节validation: 261个样本,185331字节test: 537个样本,433560字节
- 下载大小: 1070850字节
- 数据集大小: 804675字节
数据集配置 gbm
- 特征:
_id: 字符串query: 字符串title: 字符串text: 字符串language: 字符串code: 字符串
- 分割:
train: 230个样本,185208字节validation: 261个样本,184119字节test: 538个样本,426897字节
- 下载大小: 946654字节
- 数据集大小: 796224字节
数据集配置 gom
- 特征:
_id: 字符串query: 字符串title: 字符串text: 字符串language: 字符串code: 字符串
- 分割:
train: 235个样本,194506字节validation: 269个样本,192119字节test: 533个样本,432531字节
- 下载大小: 964906字节
- 数据集大小: 819156字节
数据集配置 gu
- 特征:
_id: 字符串query: 字符串title: 字符串text: 字符串language: 字符串code: 字符串
- 分割:
train: 241个样本,194290字节validation: 275个样本,193151字节test: 534个样本,426459字节
- 下载大小: 971538字节
- 数据集大小: 813900字节
数据集配置 hi
- 特征:
_id: 字符串query: 字符串title: 字符串text: 字符串language: 字符串code: 字符串
- 分割:
train: 278个样本,219153字节validation: 276个样本,192465字节test: 539个样本,429235字节
- 下载大小: 1003624字节
- 数据集大小: 840853字节
数据集配置 hne
- 特征:
_id: 字符串query: 字符串title: 字符串text: 字符串language: 字符串code: 字符串
- 分割:
train: 277个样本,221342字节validation: 276个样本,196040字节test: 533个样本,430852字节
- 下载大小: 1005200字节
- 数据集大小: 848234字节
数据集配置 kn
- 特征:
_id: 字符串query: 字符串title: 字符串text: 字符串language: 字符串code: 字符串
- 分割:
train: 278个样本,221147字节validation: 277个样本,196488字节test: 535个样本,428691字节
- 下载大小: 1007264字节
- 数据集大小: 846326字节
数据集配置 mai
- 特征:
_id: 字符串query: 字符串title: 字符串text: 字符串language: 字符串code: 字符串
- 分割:
train: 276个样本,223440字节validation: 273个样本,196049字节test: 539个样本,431545字节
- 下载大小: 1005560字节
- 数据集大小: 851034字节
数据集配置 ml
- 特征:
_id: 字符串query: 字符串title: 字符串text: 字符串language: 字符串code: 字符串
- 分割:
train: 278个样本,225396字节validation: 277个样本,200574字节test: 533个样本,436795字节
- 下载大小: 1014162字节
- 数据集大小: 862765字节
数据集配置 mni
- 特征:
_id: 字符串query: 字符串title: 字符串text: 字符串language: 字符串code: 字符串
- 分割:
train: 278个样本,225672字节validation: 284个样本,206226字节test: 532个样本,440315字节
- 下载大小: 1909577字节
- 数据集大小: 872213字节
数据集配置 mr
- 特征:
_id: 字符串query: 字符串title: 字符串text: 字符串language: 字符串code: 字符串
- 分割:
train: 276个样本,219673字节validation: 289个样本,202734字节test: 537个样本,429963字节
- 下载大小: 1016040字节
- 数据集大小: 852370字节
数据集配置 mwr
- 特征:
_id: 字符串query: 字符串title: 字符串text: 字符串language: 字符串code: 字符串
- 分割:
train: 279个样本,217962字节validation: 290个样本,200319字节test: 536个样本,424328字节
- 下载大小: 1012502字节
- 数据集大小: 842609字节
数据集配置 or
- 特征:
_id: 字符串query: 字符串title: 字符串text: 字符串language: 字符串code: 字符串
- 分割:
train: 278个样本,221936字节validation: 290个样本,206278字节test: 536个样本,422930字节
- 下载大小: 1017246字节
- 数据集大小: 851144字节
数据集配置 pa
- 特征:
_id: 字符串query: 字符串title: 字符串text: 字符串language: 字符串code: 字符串
- 分割:
train: 277个样本,217525字节validation: 289个样本,201150字节test: 528个样本,418441字节
- 下载大小: 1003902字节
- 数据集大小: 837116字节
数据集配置 ps
- 特征:
_id: 字符串query: 字符串title: 字符串text: 字符串language: 字符串code: 字符串
- 分割:
train: 277个样本,206474字节validation: 288个样本,188937字节test: 537个样本,404831字节
- 下载大小: 993728字节
- 数据集大小: 800242字节
数据集配置 sa
- 特征:
_id: 字符串query: 字符串title: 字符串text: 字符串language: 字符串code: 字符串
- 分割:
train: 278个样本,221586字节validation: 289个样本,205293字节test: 534个样本,433570字节
- 下载大小: 1018634字节
- 数据集大小: 860449字节
数据集配置 ta
- 特征:
_id: 字符串query: 字符串title: 字符串text: 字符串language: 字符串code: 字符串
- 分割:
train: 278个样本,223414字节validation: 288个样本,206155字节test: 534个样本,434283字节
- 下载大小: 1017250字节
- 数据集大小: 863852字节
数据集配置 ur
- 特征:
_id: 字符串query: 字符串title: 字符串text: 字符串language: 字符串code: 字符串
- 分割:
train: 279个样本,206610字节validation: 291个样本,189744字节test: 538个样本,404685字节
- 下载大小: 997046字节
- 数据集大小: 801039字节



