nthakur/mkqa-open-domain
收藏Hugging Face2024-05-15 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/nthakur/mkqa-open-domain
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: ar
features:
- name: id
dtype: int64
- name: query
dtype: string
- name: answers
sequence: string
splits:
- name: train
num_bytes: 890413.4359277893
num_examples: 6500
- name: test
num_bytes: 35342.56407221071
num_examples: 258
download_size: 524191
dataset_size: 925756.0
- config_name: de
features:
- name: id
dtype: int64
- name: query
dtype: string
- name: answers
sequence: string
splits:
- name: train
num_bytes: 729023.3796981355
num_examples: 6500
- name: test
num_bytes: 28936.620301864456
num_examples: 258
download_size: 499976
dataset_size: 757960.0
- config_name: en
features:
- name: id
dtype: int64
- name: query
dtype: string
- name: answers
sequence: string
splits:
- name: train
num_bytes: 762704.4983722995
num_examples: 6500
- name: test
num_bytes: 30273.5016277005
num_examples: 258
download_size: 497421
dataset_size: 792978.0
- config_name: es
features:
- name: id
dtype: int64
- name: query
dtype: string
- name: answers
sequence: string
splits:
- name: train
num_bytes: 744583.7525895236
num_examples: 6500
- name: test
num_bytes: 29554.247410476473
num_examples: 258
download_size: 496828
dataset_size: 774138.0
- config_name: fi
features:
- name: id
dtype: int64
- name: query
dtype: string
- name: answers
sequence: string
splits:
- name: train
num_bytes: 648983.3530630365
num_examples: 6500
- name: test
num_bytes: 25759.6469369636
num_examples: 258
download_size: 447412
dataset_size: 674743.0
- config_name: fr
features:
- name: id
dtype: int64
- name: query
dtype: string
- name: answers
sequence: string
splits:
- name: train
num_bytes: 733298.6830423202
num_examples: 6500
- name: test
num_bytes: 29106.316957679788
num_examples: 258
download_size: 492979
dataset_size: 762405.0
- config_name: ja
features:
- name: id
dtype: int64
- name: query
dtype: string
- name: answers
sequence: string
splits:
- name: train
num_bytes: 868958.0497188517
num_examples: 6500
- name: test
num_bytes: 34490.95028114827
num_examples: 258
download_size: 530396
dataset_size: 903449.0
- config_name: ko
features:
- name: id
dtype: int64
- name: query
dtype: string
- name: answers
sequence: string
splits:
- name: train
num_bytes: 695648.1207457827
num_examples: 6500
- name: test
num_bytes: 27611.879254217223
num_examples: 258
download_size: 461070
dataset_size: 723260.0
- config_name: ru
features:
- name: id
dtype: int64
- name: query
dtype: string
- name: answers
sequence: string
splits:
- name: train
num_bytes: 1048036.0313702279
num_examples: 6500
- name: test
num_bytes: 41598.96862977212
num_examples: 258
download_size: 613678
dataset_size: 1089635.0
- config_name: th
features:
- name: id
dtype: int64
- name: query
dtype: string
- name: answers
sequence: string
splits:
- name: train
num_bytes: 1178262.0597810003
num_examples: 6500
- name: test
num_bytes: 46767.940218999705
num_examples: 258
download_size: 609139
dataset_size: 1225030.0
- config_name: zh
features:
- name: id
dtype: int64
- name: query
dtype: string
- name: answers
sequence: string
splits:
- name: train
num_bytes: 595897.4548683042
num_examples: 6500
- name: test
num_bytes: 23652.54513169577
num_examples: 258
download_size: 428244
dataset_size: 619550.0
configs:
- config_name: ar
data_files:
- split: train
path: ar/train-*
- split: test
path: ar/test-*
- config_name: de
data_files:
- split: train
path: de/train-*
- split: test
path: de/test-*
- config_name: en
data_files:
- split: train
path: en/train-*
- split: test
path: en/test-*
- config_name: es
data_files:
- split: train
path: es/train-*
- split: test
path: es/test-*
- config_name: fi
data_files:
- split: train
path: fi/train-*
- split: test
path: fi/test-*
- config_name: fr
data_files:
- split: train
path: fr/train-*
- split: test
path: fr/test-*
- config_name: ja
data_files:
- split: train
path: ja/train-*
- split: test
path: ja/test-*
- config_name: ko
data_files:
- split: train
path: ko/train-*
- split: test
path: ko/test-*
- config_name: ru
data_files:
- split: train
path: ru/train-*
- split: test
path: ru/test-*
- config_name: th
data_files:
- split: train
path: th/train-*
- split: test
path: th/test-*
- config_name: zh
data_files:
- split: train
path: zh/train-*
- split: test
path: zh/test-*
---
提供机构:
nthakur
原始信息汇总
数据集概述
数据集配置
阿拉伯语 (ar)
- 特征:
id: int64query: stringanswers: string (序列)
- 分割:
train:- 字节数: 890413.4359277893
- 样本数: 6500
test:- 字节数: 35342.56407221071
- 样本数: 258
- 下载大小: 524191
- 数据集大小: 925756.0
德语 (de)
- 特征:
id: int64query: stringanswers: string (序列)
- 分割:
train:- 字节数: 729023.3796981355
- 样本数: 6500
test:- 字节数: 28936.620301864456
- 样本数: 258
- 下载大小: 499976
- 数据集大小: 757960.0
英语 (en)
- 特征:
id: int64query: stringanswers: string (序列)
- 分割:
train:- 字节数: 762704.4983722995
- 样本数: 6500
test:- 字节数: 30273.5016277005
- 样本数: 258
- 下载大小: 497421
- 数据集大小: 792978.0
西班牙语 (es)
- 特征:
id: int64query: stringanswers: string (序列)
- 分割:
train:- 字节数: 744583.7525895236
- 样本数: 6500
test:- 字节数: 29554.247410476473
- 样本数: 258
- 下载大小: 496828
- 数据集大小: 774138.0
芬兰语 (fi)
- 特征:
id: int64query: stringanswers: string (序列)
- 分割:
train:- 字节数: 648983.3530630365
- 样本数: 6500
test:- 字节数: 25759.6469369636
- 样本数: 258
- 下载大小: 447412
- 数据集大小: 674743.0
法语 (fr)
- 特征:
id: int64query: stringanswers: string (序列)
- 分割:
train:- 字节数: 733298.6830423202
- 样本数: 6500
test:- 字节数: 29106.316957679788
- 样本数: 258
- 下载大小: 492979
- 数据集大小: 762405.0
日语 (ja)
- 特征:
id: int64query: stringanswers: string (序列)
- 分割:
train:- 字节数: 868958.0497188517
- 样本数: 6500
test:- 字节数: 34490.95028114827
- 样本数: 258
- 下载大小: 530396
- 数据集大小: 903449.0
韩语 (ko)
- 特征:
id: int64query: stringanswers: string (序列)
- 分割:
train:- 字节数: 695648.1207457827
- 样本数: 6500
test:- 字节数: 27611.879254217223
- 样本数: 258
- 下载大小: 461070
- 数据集大小: 723260.0
俄语 (ru)
- 特征:
id: int64query: stringanswers: string (序列)
- 分割:
train:- 字节数: 1048036.0313702279
- 样本数: 6500
test:- 字节数: 41598.96862977212
- 样本数: 258
- 下载大小: 613678
- 数据集大小: 1089635.0
泰语 (th)
- 特征:
id: int64query: stringanswers: string (序列)
- 分割:
train:- 字节数: 1178262.0597810003
- 样本数: 6500
test:- 字节数: 46767.940218999705
- 样本数: 258
- 下载大小: 609139
- 数据集大小: 1225030.0
中文 (zh)
- 特征:
id: int64query: stringanswers: string (序列)
- 分割:
train:- 字节数: 595897.4548683042
- 样本数: 6500
test:- 字节数: 23652.54513169577
- 样本数: 258
- 下载大小: 428244
- 数据集大小: 619550.0
数据文件路径
- 阿拉伯语 (ar):
train: ar/train-*test: ar/test-*
- 德语 (de):
train: de/train-*test: de/test-*
- 英语 (en):
train: en/train-*test: en/test-*
- 西班牙语 (es):
train: es/train-*test: es/test-*
- 芬兰语 (fi):
train: fi/train-*test: fi/test-*
- 法语 (fr):
train: fr/train-*test: fr/test-*
- 日语 (ja):
train: ja/train-*test: ja/test-*
- 韩语 (ko):
train: ko/train-*test: ko/test-*
- 俄语 (ru):
train: ru/train-*test: ru/test-*
- 泰语 (th):
train: th/train-*test: th/test-*
- 中文 (zh):
train: zh/train-*test: zh/test-*



