five

nthakur/xtreme-up-retrieval-cross-lang

收藏
Hugging Face2024-04-27 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/nthakur/xtreme-up-retrieval-cross-lang
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: as features: - name: _id dtype: string - name: query dtype: string - name: title dtype: string - name: text dtype: string - name: language dtype: string - name: code dtype: string splits: - name: train num_bytes: 186950 num_examples: 227 - name: validation num_bytes: 188857 num_examples: 260 - name: test num_bytes: 440015 num_examples: 537 download_size: 1074315 dataset_size: 815822 - config_name: bho features: - name: _id dtype: string - name: query dtype: string - name: title dtype: string - name: text dtype: string - name: language dtype: string - name: code dtype: string splits: - name: train num_bytes: 184034 num_examples: 230 - name: validation num_bytes: 182929 num_examples: 262 - name: test num_bytes: 426805 num_examples: 535 download_size: 944336 dataset_size: 793768 - config_name: brx features: - name: _id dtype: string - name: query dtype: string - name: title dtype: string - name: text dtype: string - name: language dtype: string - name: code dtype: string splits: - name: train num_bytes: 185784 num_examples: 230 - name: validation num_bytes: 185331 num_examples: 261 - name: test num_bytes: 433560 num_examples: 537 download_size: 1070850 dataset_size: 804675 - config_name: gbm features: - name: _id dtype: string - name: query dtype: string - name: title dtype: string - name: text dtype: string - name: language dtype: string - name: code dtype: string splits: - name: train num_bytes: 185208 num_examples: 230 - name: validation num_bytes: 184119 num_examples: 261 - name: test num_bytes: 426897 num_examples: 538 download_size: 946654 dataset_size: 796224 - config_name: gom features: - name: _id dtype: string - name: query dtype: string - name: title dtype: string - name: text dtype: string - name: language dtype: string - name: code dtype: string splits: - name: train num_bytes: 194506 num_examples: 235 - name: validation num_bytes: 192119 num_examples: 269 - name: test num_bytes: 432531 num_examples: 533 download_size: 964906 dataset_size: 819156 - config_name: gu features: - name: _id dtype: string - name: query dtype: string - name: title dtype: string - name: text dtype: string - name: language dtype: string - name: code dtype: string splits: - name: train num_bytes: 194290 num_examples: 241 - name: validation num_bytes: 193151 num_examples: 275 - name: test num_bytes: 426459 num_examples: 534 download_size: 971538 dataset_size: 813900 - config_name: hi features: - name: _id dtype: string - name: query dtype: string - name: title dtype: string - name: text dtype: string - name: language dtype: string - name: code dtype: string splits: - name: train num_bytes: 219153 num_examples: 278 - name: validation num_bytes: 192465 num_examples: 276 - name: test num_bytes: 429235 num_examples: 539 download_size: 1003624 dataset_size: 840853 - config_name: hne features: - name: _id dtype: string - name: query dtype: string - name: title dtype: string - name: text dtype: string - name: language dtype: string - name: code dtype: string splits: - name: train num_bytes: 221342 num_examples: 277 - name: validation num_bytes: 196040 num_examples: 276 - name: test num_bytes: 430852 num_examples: 533 download_size: 1005200 dataset_size: 848234 - config_name: kn features: - name: _id dtype: string - name: query dtype: string - name: title dtype: string - name: text dtype: string - name: language dtype: string - name: code dtype: string splits: - name: train num_bytes: 221147 num_examples: 278 - name: validation num_bytes: 196488 num_examples: 277 - name: test num_bytes: 428691 num_examples: 535 download_size: 1007264 dataset_size: 846326 - config_name: mai features: - name: _id dtype: string - name: query dtype: string - name: title dtype: string - name: text dtype: string - name: language dtype: string - name: code dtype: string splits: - name: train num_bytes: 223440 num_examples: 276 - name: validation num_bytes: 196049 num_examples: 273 - name: test num_bytes: 431545 num_examples: 539 download_size: 1005560 dataset_size: 851034 - config_name: ml features: - name: _id dtype: string - name: query dtype: string - name: title dtype: string - name: text dtype: string - name: language dtype: string - name: code dtype: string splits: - name: train num_bytes: 225396 num_examples: 278 - name: validation num_bytes: 200574 num_examples: 277 - name: test num_bytes: 436795 num_examples: 533 download_size: 1014162 dataset_size: 862765 - config_name: mni features: - name: _id dtype: string - name: query dtype: string - name: title dtype: string - name: text dtype: string - name: language dtype: string - name: code dtype: string splits: - name: train num_bytes: 225672 num_examples: 278 - name: validation num_bytes: 206226 num_examples: 284 - name: test num_bytes: 440315 num_examples: 532 download_size: 1909577 dataset_size: 872213 - config_name: mr features: - name: _id dtype: string - name: query dtype: string - name: title dtype: string - name: text dtype: string - name: language dtype: string - name: code dtype: string splits: - name: train num_bytes: 219673 num_examples: 276 - name: validation num_bytes: 202734 num_examples: 289 - name: test num_bytes: 429963 num_examples: 537 download_size: 1016040 dataset_size: 852370 - config_name: mwr features: - name: _id dtype: string - name: query dtype: string - name: title dtype: string - name: text dtype: string - name: language dtype: string - name: code dtype: string splits: - name: train num_bytes: 217962 num_examples: 279 - name: validation num_bytes: 200319 num_examples: 290 - name: test num_bytes: 424328 num_examples: 536 download_size: 1012502 dataset_size: 842609 - config_name: or features: - name: _id dtype: string - name: query dtype: string - name: title dtype: string - name: text dtype: string - name: language dtype: string - name: code dtype: string splits: - name: train num_bytes: 221936 num_examples: 278 - name: validation num_bytes: 206278 num_examples: 290 - name: test num_bytes: 422930 num_examples: 536 download_size: 1017246 dataset_size: 851144 - config_name: pa features: - name: _id dtype: string - name: query dtype: string - name: title dtype: string - name: text dtype: string - name: language dtype: string - name: code dtype: string splits: - name: train num_bytes: 217525 num_examples: 277 - name: validation num_bytes: 201150 num_examples: 289 - name: test num_bytes: 418441 num_examples: 528 download_size: 1003902 dataset_size: 837116 - config_name: ps features: - name: _id dtype: string - name: query dtype: string - name: title dtype: string - name: text dtype: string - name: language dtype: string - name: code dtype: string splits: - name: train num_bytes: 206474 num_examples: 277 - name: validation num_bytes: 188937 num_examples: 288 - name: test num_bytes: 404831 num_examples: 537 download_size: 993728 dataset_size: 800242 - config_name: sa features: - name: _id dtype: string - name: query dtype: string - name: title dtype: string - name: text dtype: string - name: language dtype: string - name: code dtype: string splits: - name: train num_bytes: 221586 num_examples: 278 - name: validation num_bytes: 205293 num_examples: 289 - name: test num_bytes: 433570 num_examples: 534 download_size: 1018634 dataset_size: 860449 - config_name: ta features: - name: _id dtype: string - name: query dtype: string - name: title dtype: string - name: text dtype: string - name: language dtype: string - name: code dtype: string splits: - name: train num_bytes: 223414 num_examples: 278 - name: validation num_bytes: 206155 num_examples: 288 - name: test num_bytes: 434283 num_examples: 534 download_size: 1017250 dataset_size: 863852 - config_name: ur features: - name: _id dtype: string - name: query dtype: string - name: title dtype: string - name: text dtype: string - name: language dtype: string - name: code dtype: string splits: - name: train num_bytes: 206610 num_examples: 279 - name: validation num_bytes: 189744 num_examples: 291 - name: test num_bytes: 404685 num_examples: 538 download_size: 997046 dataset_size: 801039 configs: - config_name: as data_files: - split: train path: as/train-* - split: validation path: as/validation-* - split: test path: as/test-* - config_name: bho data_files: - split: train path: bho/train-* - split: validation path: bho/validation-* - split: test path: bho/test-* - config_name: brx data_files: - split: train path: brx/train-* - split: validation path: brx/validation-* - split: test path: brx/test-* - config_name: gbm data_files: - split: train path: gbm/train-* - split: validation path: gbm/validation-* - split: test path: gbm/test-* - config_name: gom data_files: - split: train path: gom/train-* - split: validation path: gom/validation-* - split: test path: gom/test-* - config_name: gu data_files: - split: train path: gu/train-* - split: validation path: gu/validation-* - split: test path: gu/test-* - config_name: hi data_files: - split: train path: hi/train-* - split: validation path: hi/validation-* - split: test path: hi/test-* - config_name: hne data_files: - split: train path: hne/train-* - split: validation path: hne/validation-* - split: test path: hne/test-* - config_name: kn data_files: - split: train path: kn/train-* - split: validation path: kn/validation-* - split: test path: kn/test-* - config_name: mai data_files: - split: train path: mai/train-* - split: validation path: mai/validation-* - split: test path: mai/test-* - config_name: ml data_files: - split: train path: ml/train-* - split: validation path: ml/validation-* - split: test path: ml/test-* - config_name: mni data_files: - split: train path: mni/train-* - split: validation path: mni/validation-* - split: test path: mni/test-* - config_name: mr data_files: - split: train path: mr/train-* - split: validation path: mr/validation-* - split: test path: mr/test-* - config_name: mwr data_files: - split: train path: mwr/train-* - split: validation path: mwr/validation-* - split: test path: mwr/test-* - config_name: or data_files: - split: train path: or/train-* - split: validation path: or/validation-* - split: test path: or/test-* - config_name: pa data_files: - split: train path: pa/train-* - split: validation path: pa/validation-* - split: test path: pa/test-* - config_name: ps data_files: - split: train path: ps/train-* - split: validation path: ps/validation-* - split: test path: ps/test-* - config_name: sa data_files: - split: train path: sa/train-* - split: validation path: sa/validation-* - split: test path: sa/test-* - config_name: ta data_files: - split: train path: ta/train-* - split: validation path: ta/validation-* - split: test path: ta/test-* - config_name: ur data_files: - split: train path: ur/train-* - split: validation path: ur/validation-* - split: test path: ur/test-* ---
提供机构:
nthakur
原始信息汇总

数据集概述

数据集配置 as

  • 特征:
    • _id: 字符串
    • query: 字符串
    • title: 字符串
    • text: 字符串
    • language: 字符串
    • code: 字符串
  • 分割:
    • train: 227个样本,186950字节
    • validation: 260个样本,188857字节
    • test: 537个样本,440015字节
  • 下载大小: 1074315字节
  • 数据集大小: 815822字节

数据集配置 bho

  • 特征:
    • _id: 字符串
    • query: 字符串
    • title: 字符串
    • text: 字符串
    • language: 字符串
    • code: 字符串
  • 分割:
    • train: 230个样本,184034字节
    • validation: 262个样本,182929字节
    • test: 535个样本,426805字节
  • 下载大小: 944336字节
  • 数据集大小: 793768字节

数据集配置 brx

  • 特征:
    • _id: 字符串
    • query: 字符串
    • title: 字符串
    • text: 字符串
    • language: 字符串
    • code: 字符串
  • 分割:
    • train: 230个样本,185784字节
    • validation: 261个样本,185331字节
    • test: 537个样本,433560字节
  • 下载大小: 1070850字节
  • 数据集大小: 804675字节

数据集配置 gbm

  • 特征:
    • _id: 字符串
    • query: 字符串
    • title: 字符串
    • text: 字符串
    • language: 字符串
    • code: 字符串
  • 分割:
    • train: 230个样本,185208字节
    • validation: 261个样本,184119字节
    • test: 538个样本,426897字节
  • 下载大小: 946654字节
  • 数据集大小: 796224字节

数据集配置 gom

  • 特征:
    • _id: 字符串
    • query: 字符串
    • title: 字符串
    • text: 字符串
    • language: 字符串
    • code: 字符串
  • 分割:
    • train: 235个样本,194506字节
    • validation: 269个样本,192119字节
    • test: 533个样本,432531字节
  • 下载大小: 964906字节
  • 数据集大小: 819156字节

数据集配置 gu

  • 特征:
    • _id: 字符串
    • query: 字符串
    • title: 字符串
    • text: 字符串
    • language: 字符串
    • code: 字符串
  • 分割:
    • train: 241个样本,194290字节
    • validation: 275个样本,193151字节
    • test: 534个样本,426459字节
  • 下载大小: 971538字节
  • 数据集大小: 813900字节

数据集配置 hi

  • 特征:
    • _id: 字符串
    • query: 字符串
    • title: 字符串
    • text: 字符串
    • language: 字符串
    • code: 字符串
  • 分割:
    • train: 278个样本,219153字节
    • validation: 276个样本,192465字节
    • test: 539个样本,429235字节
  • 下载大小: 1003624字节
  • 数据集大小: 840853字节

数据集配置 hne

  • 特征:
    • _id: 字符串
    • query: 字符串
    • title: 字符串
    • text: 字符串
    • language: 字符串
    • code: 字符串
  • 分割:
    • train: 277个样本,221342字节
    • validation: 276个样本,196040字节
    • test: 533个样本,430852字节
  • 下载大小: 1005200字节
  • 数据集大小: 848234字节

数据集配置 kn

  • 特征:
    • _id: 字符串
    • query: 字符串
    • title: 字符串
    • text: 字符串
    • language: 字符串
    • code: 字符串
  • 分割:
    • train: 278个样本,221147字节
    • validation: 277个样本,196488字节
    • test: 535个样本,428691字节
  • 下载大小: 1007264字节
  • 数据集大小: 846326字节

数据集配置 mai

  • 特征:
    • _id: 字符串
    • query: 字符串
    • title: 字符串
    • text: 字符串
    • language: 字符串
    • code: 字符串
  • 分割:
    • train: 276个样本,223440字节
    • validation: 273个样本,196049字节
    • test: 539个样本,431545字节
  • 下载大小: 1005560字节
  • 数据集大小: 851034字节

数据集配置 ml

  • 特征:
    • _id: 字符串
    • query: 字符串
    • title: 字符串
    • text: 字符串
    • language: 字符串
    • code: 字符串
  • 分割:
    • train: 278个样本,225396字节
    • validation: 277个样本,200574字节
    • test: 533个样本,436795字节
  • 下载大小: 1014162字节
  • 数据集大小: 862765字节

数据集配置 mni

  • 特征:
    • _id: 字符串
    • query: 字符串
    • title: 字符串
    • text: 字符串
    • language: 字符串
    • code: 字符串
  • 分割:
    • train: 278个样本,225672字节
    • validation: 284个样本,206226字节
    • test: 532个样本,440315字节
  • 下载大小: 1909577字节
  • 数据集大小: 872213字节

数据集配置 mr

  • 特征:
    • _id: 字符串
    • query: 字符串
    • title: 字符串
    • text: 字符串
    • language: 字符串
    • code: 字符串
  • 分割:
    • train: 276个样本,219673字节
    • validation: 289个样本,202734字节
    • test: 537个样本,429963字节
  • 下载大小: 1016040字节
  • 数据集大小: 852370字节

数据集配置 mwr

  • 特征:
    • _id: 字符串
    • query: 字符串
    • title: 字符串
    • text: 字符串
    • language: 字符串
    • code: 字符串
  • 分割:
    • train: 279个样本,217962字节
    • validation: 290个样本,200319字节
    • test: 536个样本,424328字节
  • 下载大小: 1012502字节
  • 数据集大小: 842609字节

数据集配置 or

  • 特征:
    • _id: 字符串
    • query: 字符串
    • title: 字符串
    • text: 字符串
    • language: 字符串
    • code: 字符串
  • 分割:
    • train: 278个样本,221936字节
    • validation: 290个样本,206278字节
    • test: 536个样本,422930字节
  • 下载大小: 1017246字节
  • 数据集大小: 851144字节

数据集配置 pa

  • 特征:
    • _id: 字符串
    • query: 字符串
    • title: 字符串
    • text: 字符串
    • language: 字符串
    • code: 字符串
  • 分割:
    • train: 277个样本,217525字节
    • validation: 289个样本,201150字节
    • test: 528个样本,418441字节
  • 下载大小: 1003902字节
  • 数据集大小: 837116字节

数据集配置 ps

  • 特征:
    • _id: 字符串
    • query: 字符串
    • title: 字符串
    • text: 字符串
    • language: 字符串
    • code: 字符串
  • 分割:
    • train: 277个样本,206474字节
    • validation: 288个样本,188937字节
    • test: 537个样本,404831字节
  • 下载大小: 993728字节
  • 数据集大小: 800242字节

数据集配置 sa

  • 特征:
    • _id: 字符串
    • query: 字符串
    • title: 字符串
    • text: 字符串
    • language: 字符串
    • code: 字符串
  • 分割:
    • train: 278个样本,221586字节
    • validation: 289个样本,205293字节
    • test: 534个样本,433570字节
  • 下载大小: 1018634字节
  • 数据集大小: 860449字节

数据集配置 ta

  • 特征:
    • _id: 字符串
    • query: 字符串
    • title: 字符串
    • text: 字符串
    • language: 字符串
    • code: 字符串
  • 分割:
    • train: 278个样本,223414字节
    • validation: 288个样本,206155字节
    • test: 534个样本,434283字节
  • 下载大小: 1017250字节
  • 数据集大小: 863852字节

数据集配置 ur

  • 特征:
    • _id: 字符串
    • query: 字符串
    • title: 字符串
    • text: 字符串
    • language: 字符串
    • code: 字符串
  • 分割:
    • train: 279个样本,206610字节
    • validation: 291个样本,189744字节
    • test: 538个样本,404685字节
  • 下载大小: 997046字节
  • 数据集大小: 801039字节
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作