five

nthakur/mkqa-open-domain

收藏
Hugging Face2024-05-15 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/nthakur/mkqa-open-domain
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: ar features: - name: id dtype: int64 - name: query dtype: string - name: answers sequence: string splits: - name: train num_bytes: 890413.4359277893 num_examples: 6500 - name: test num_bytes: 35342.56407221071 num_examples: 258 download_size: 524191 dataset_size: 925756.0 - config_name: de features: - name: id dtype: int64 - name: query dtype: string - name: answers sequence: string splits: - name: train num_bytes: 729023.3796981355 num_examples: 6500 - name: test num_bytes: 28936.620301864456 num_examples: 258 download_size: 499976 dataset_size: 757960.0 - config_name: en features: - name: id dtype: int64 - name: query dtype: string - name: answers sequence: string splits: - name: train num_bytes: 762704.4983722995 num_examples: 6500 - name: test num_bytes: 30273.5016277005 num_examples: 258 download_size: 497421 dataset_size: 792978.0 - config_name: es features: - name: id dtype: int64 - name: query dtype: string - name: answers sequence: string splits: - name: train num_bytes: 744583.7525895236 num_examples: 6500 - name: test num_bytes: 29554.247410476473 num_examples: 258 download_size: 496828 dataset_size: 774138.0 - config_name: fi features: - name: id dtype: int64 - name: query dtype: string - name: answers sequence: string splits: - name: train num_bytes: 648983.3530630365 num_examples: 6500 - name: test num_bytes: 25759.6469369636 num_examples: 258 download_size: 447412 dataset_size: 674743.0 - config_name: fr features: - name: id dtype: int64 - name: query dtype: string - name: answers sequence: string splits: - name: train num_bytes: 733298.6830423202 num_examples: 6500 - name: test num_bytes: 29106.316957679788 num_examples: 258 download_size: 492979 dataset_size: 762405.0 - config_name: ja features: - name: id dtype: int64 - name: query dtype: string - name: answers sequence: string splits: - name: train num_bytes: 868958.0497188517 num_examples: 6500 - name: test num_bytes: 34490.95028114827 num_examples: 258 download_size: 530396 dataset_size: 903449.0 - config_name: ko features: - name: id dtype: int64 - name: query dtype: string - name: answers sequence: string splits: - name: train num_bytes: 695648.1207457827 num_examples: 6500 - name: test num_bytes: 27611.879254217223 num_examples: 258 download_size: 461070 dataset_size: 723260.0 - config_name: ru features: - name: id dtype: int64 - name: query dtype: string - name: answers sequence: string splits: - name: train num_bytes: 1048036.0313702279 num_examples: 6500 - name: test num_bytes: 41598.96862977212 num_examples: 258 download_size: 613678 dataset_size: 1089635.0 - config_name: th features: - name: id dtype: int64 - name: query dtype: string - name: answers sequence: string splits: - name: train num_bytes: 1178262.0597810003 num_examples: 6500 - name: test num_bytes: 46767.940218999705 num_examples: 258 download_size: 609139 dataset_size: 1225030.0 - config_name: zh features: - name: id dtype: int64 - name: query dtype: string - name: answers sequence: string splits: - name: train num_bytes: 595897.4548683042 num_examples: 6500 - name: test num_bytes: 23652.54513169577 num_examples: 258 download_size: 428244 dataset_size: 619550.0 configs: - config_name: ar data_files: - split: train path: ar/train-* - split: test path: ar/test-* - config_name: de data_files: - split: train path: de/train-* - split: test path: de/test-* - config_name: en data_files: - split: train path: en/train-* - split: test path: en/test-* - config_name: es data_files: - split: train path: es/train-* - split: test path: es/test-* - config_name: fi data_files: - split: train path: fi/train-* - split: test path: fi/test-* - config_name: fr data_files: - split: train path: fr/train-* - split: test path: fr/test-* - config_name: ja data_files: - split: train path: ja/train-* - split: test path: ja/test-* - config_name: ko data_files: - split: train path: ko/train-* - split: test path: ko/test-* - config_name: ru data_files: - split: train path: ru/train-* - split: test path: ru/test-* - config_name: th data_files: - split: train path: th/train-* - split: test path: th/test-* - config_name: zh data_files: - split: train path: zh/train-* - split: test path: zh/test-* ---
提供机构:
nthakur
原始信息汇总

数据集概述

数据集配置

阿拉伯语 (ar)

  • 特征:
    • id: int64
    • query: string
    • answers: string (序列)
  • 分割:
    • train:
      • 字节数: 890413.4359277893
      • 样本数: 6500
    • test:
      • 字节数: 35342.56407221071
      • 样本数: 258
  • 下载大小: 524191
  • 数据集大小: 925756.0

德语 (de)

  • 特征:
    • id: int64
    • query: string
    • answers: string (序列)
  • 分割:
    • train:
      • 字节数: 729023.3796981355
      • 样本数: 6500
    • test:
      • 字节数: 28936.620301864456
      • 样本数: 258
  • 下载大小: 499976
  • 数据集大小: 757960.0

英语 (en)

  • 特征:
    • id: int64
    • query: string
    • answers: string (序列)
  • 分割:
    • train:
      • 字节数: 762704.4983722995
      • 样本数: 6500
    • test:
      • 字节数: 30273.5016277005
      • 样本数: 258
  • 下载大小: 497421
  • 数据集大小: 792978.0

西班牙语 (es)

  • 特征:
    • id: int64
    • query: string
    • answers: string (序列)
  • 分割:
    • train:
      • 字节数: 744583.7525895236
      • 样本数: 6500
    • test:
      • 字节数: 29554.247410476473
      • 样本数: 258
  • 下载大小: 496828
  • 数据集大小: 774138.0

芬兰语 (fi)

  • 特征:
    • id: int64
    • query: string
    • answers: string (序列)
  • 分割:
    • train:
      • 字节数: 648983.3530630365
      • 样本数: 6500
    • test:
      • 字节数: 25759.6469369636
      • 样本数: 258
  • 下载大小: 447412
  • 数据集大小: 674743.0

法语 (fr)

  • 特征:
    • id: int64
    • query: string
    • answers: string (序列)
  • 分割:
    • train:
      • 字节数: 733298.6830423202
      • 样本数: 6500
    • test:
      • 字节数: 29106.316957679788
      • 样本数: 258
  • 下载大小: 492979
  • 数据集大小: 762405.0

日语 (ja)

  • 特征:
    • id: int64
    • query: string
    • answers: string (序列)
  • 分割:
    • train:
      • 字节数: 868958.0497188517
      • 样本数: 6500
    • test:
      • 字节数: 34490.95028114827
      • 样本数: 258
  • 下载大小: 530396
  • 数据集大小: 903449.0

韩语 (ko)

  • 特征:
    • id: int64
    • query: string
    • answers: string (序列)
  • 分割:
    • train:
      • 字节数: 695648.1207457827
      • 样本数: 6500
    • test:
      • 字节数: 27611.879254217223
      • 样本数: 258
  • 下载大小: 461070
  • 数据集大小: 723260.0

俄语 (ru)

  • 特征:
    • id: int64
    • query: string
    • answers: string (序列)
  • 分割:
    • train:
      • 字节数: 1048036.0313702279
      • 样本数: 6500
    • test:
      • 字节数: 41598.96862977212
      • 样本数: 258
  • 下载大小: 613678
  • 数据集大小: 1089635.0

泰语 (th)

  • 特征:
    • id: int64
    • query: string
    • answers: string (序列)
  • 分割:
    • train:
      • 字节数: 1178262.0597810003
      • 样本数: 6500
    • test:
      • 字节数: 46767.940218999705
      • 样本数: 258
  • 下载大小: 609139
  • 数据集大小: 1225030.0

中文 (zh)

  • 特征:
    • id: int64
    • query: string
    • answers: string (序列)
  • 分割:
    • train:
      • 字节数: 595897.4548683042
      • 样本数: 6500
    • test:
      • 字节数: 23652.54513169577
      • 样本数: 258
  • 下载大小: 428244
  • 数据集大小: 619550.0

数据文件路径

  • 阿拉伯语 (ar):
    • train: ar/train-*
    • test: ar/test-*
  • 德语 (de):
    • train: de/train-*
    • test: de/test-*
  • 英语 (en):
    • train: en/train-*
    • test: en/test-*
  • 西班牙语 (es):
    • train: es/train-*
    • test: es/test-*
  • 芬兰语 (fi):
    • train: fi/train-*
    • test: fi/test-*
  • 法语 (fr):
    • train: fr/train-*
    • test: fr/test-*
  • 日语 (ja):
    • train: ja/train-*
    • test: ja/test-*
  • 韩语 (ko):
    • train: ko/train-*
    • test: ko/test-*
  • 俄语 (ru):
    • train: ru/train-*
    • test: ru/test-*
  • 泰语 (th):
    • train: th/train-*
    • test: th/test-*
  • 中文 (zh):
    • train: zh/train-*
    • test: zh/test-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作