nthakur/mkqa-raft-instruct
收藏Hugging Face2024-05-16 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/nthakur/mkqa-raft-instruct
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: ar
features:
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
- name: gold_answer
sequence: string
splits:
- name: train
num_bytes: 102496608
num_examples: 6500
download_size: 50730638
dataset_size: 102496608
- config_name: de
features:
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
- name: gold_answer
sequence: string
splits:
- name: train
num_bytes: 62305522
num_examples: 6500
download_size: 37035774
dataset_size: 62305522
- config_name: en
features:
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
- name: gold_answer
sequence: string
splits:
- name: train
num_bytes: 75045834
num_examples: 6500
download_size: 43109259
dataset_size: 75045834
- config_name: es
features:
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
- name: gold_answer
sequence: string
splits:
- name: train
num_bytes: 71034084
num_examples: 6500
download_size: 41873672
dataset_size: 71034084
- config_name: fi
features:
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
- name: gold_answer
sequence: string
splits:
- name: train
num_bytes: 61394960
num_examples: 6500
download_size: 36958512
dataset_size: 61394960
- config_name: fr
features:
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
- name: gold_answer
sequence: string
splits:
- name: train
num_bytes: 61439186
num_examples: 6500
download_size: 35830558
dataset_size: 61439186
- config_name: ja
features:
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
- name: gold_answer
sequence: string
splits:
- name: train
num_bytes: 76017085
num_examples: 6500
download_size: 42305829
dataset_size: 76017085
- config_name: ko
features:
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
- name: gold_answer
sequence: string
splits:
- name: train
num_bytes: 67618027
num_examples: 6500
download_size: 39510692
dataset_size: 67618027
- config_name: ru
features:
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
- name: gold_answer
sequence: string
splits:
- name: train
num_bytes: 113219755
num_examples: 6500
download_size: 56978371
dataset_size: 113219755
- config_name: th
features:
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
- name: gold_answer
sequence: string
splits:
- name: train
num_bytes: 139954459
num_examples: 6500
download_size: 56017120
dataset_size: 139954459
- config_name: zh
features:
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
- name: gold_answer
sequence: string
splits:
- name: train
num_bytes: 58135559
num_examples: 6500
download_size: 39175049
dataset_size: 58135559
configs:
- config_name: ar
data_files:
- split: train
path: ar/train-*
- config_name: de
data_files:
- split: train
path: de/train-*
- config_name: en
data_files:
- split: train
path: en/train-*
- config_name: es
data_files:
- split: train
path: es/train-*
- config_name: fi
data_files:
- split: train
path: fi/train-*
- config_name: fr
data_files:
- split: train
path: fr/train-*
- config_name: ja
data_files:
- split: train
path: ja/train-*
- config_name: ko
data_files:
- split: train
path: ko/train-*
- config_name: ru
data_files:
- split: train
path: ru/train-*
- config_name: th
data_files:
- split: train
path: th/train-*
- config_name: zh
data_files:
- split: train
path: zh/train-*
---
提供机构:
nthakur
原始信息汇总
数据集概述
数据集配置
- config_name: 数据集的语言配置,包括 ar, de, en, es, fi, fr, ja, ko, ru, th, zh。
- features: 数据集的特征信息,包括:
- prompt: 数据类型为字符串。
- query_id: 数据类型为字符串。
- doc_ids: 数据类型为字符串序列。
- documents: 数据类型为列表,包含:
- docid: 数据类型为字符串。
- text: 数据类型为字符串。
- title: 数据类型为字符串。
- gold_answer: 数据类型为字符串序列。
数据集分割
- split: 数据集分割为训练集。
- num_bytes: 训练集的数据大小,单位为字节。
- num_examples: 训练集的样本数量。
数据集大小与下载大小
- download_size: 数据集的下载大小,单位为字节。
- dataset_size: 数据集的实际大小,单位为字节。
数据文件路径
- path: 训练集数据文件的路径,格式为语言代码/train-*。
数据集详细信息
| config_name | num_bytes (字节) | num_examples | download_size (字节) | dataset_size (字节) |
|---|---|---|---|---|
| ar | 102496608 | 6500 | 50730638 | 102496608 |
| de | 62305522 | 6500 | 37035774 | 62305522 |
| en | 75045834 | 6500 | 43109259 | 75045834 |
| es | 71034084 | 6500 | 41873672 | 71034084 |
| fi | 61394960 | 6500 | 36958512 | 61394960 |
| fr | 61439186 | 6500 | 35830558 | 61439186 |
| ja | 76017085 | 6500 | 42305829 | 76017085 |
| ko | 67618027 | 6500 | 39510692 | 67618027 |
| ru | 113219755 | 6500 | 56978371 | 113219755 |
| th | 139954459 | 6500 | 56017120 | 139954459 |
| zh | 58135559 | 6500 | 39175049 | 58135559 |
以上信息提供了数据集的基本结构、大小和分布情况。



