nthakur/nomiracl-raft-instruct
收藏Hugging Face2024-04-10 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/nthakur/nomiracl-raft-instruct
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: ar
features:
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
splits:
- name: train
num_bytes: 7475443
num_examples: 458
download_size: 2779767
dataset_size: 7475443
- config_name: bn
features:
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
splits:
- name: train
num_bytes: 22505836
num_examples: 990
download_size: 6728454
dataset_size: 22505836
- config_name: de
features:
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
splits:
- name: train
num_bytes: 2903204
num_examples: 344
download_size: 1279856
dataset_size: 2903204
- config_name: en
features:
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
splits:
- name: train
num_bytes: 5319916
num_examples: 540
download_size: 2255144
dataset_size: 5319916
- config_name: es
features:
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
splits:
- name: train
num_bytes: 4089898
num_examples: 490
download_size: 1799533
dataset_size: 4089898
- config_name: fa
features:
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
splits:
- name: train
num_bytes: 17607920
num_examples: 1520
download_size: 6223402
dataset_size: 17607920
- config_name: fi
features:
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
splits:
- name: train
num_bytes: 1736496
num_examples: 198
download_size: 785685
dataset_size: 1736496
- config_name: fr
features:
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
splits:
- name: train
num_bytes: 10145726
num_examples: 1316
download_size: 4381035
dataset_size: 10145726
- config_name: hi
features:
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
splits:
- name: train
num_bytes: 36124504
num_examples: 2034
download_size: 10843542
dataset_size: 36124504
- config_name: id
features:
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
splits:
- name: train
num_bytes: 8551467
num_examples: 948
download_size: 3584151
dataset_size: 8551467
- config_name: ja
features:
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
splits:
- name: train
num_bytes: 4354109
num_examples: 424
download_size: 1852613
dataset_size: 4354109
- config_name: ko
features:
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
splits:
- name: train
num_bytes: 28729361
num_examples: 3154
download_size: 12464301
dataset_size: 28729361
- config_name: ru
features:
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
splits:
- name: train
num_bytes: 9021009
num_examples: 538
download_size: 3404613
dataset_size: 9021009
- config_name: sw
features:
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
splits:
- name: train
num_bytes: 7885801
num_examples: 1016
download_size: 3082673
dataset_size: 7885801
- config_name: te
features:
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
splits:
- name: train
num_bytes: 22055397
num_examples: 960
download_size: 5986459
dataset_size: 22055397
- config_name: th
features:
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
splits:
- name: train
num_bytes: 12567842
num_examples: 648
download_size: 3797190
dataset_size: 12567842
- config_name: yo
features:
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
splits:
- name: train
num_bytes: 19953708
num_examples: 2654
download_size: 6746460
dataset_size: 19953708
- config_name: zh
features:
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: documents
list:
- name: docid
dtype: string
- name: text
dtype: string
- name: title
dtype: string
splits:
- name: train
num_bytes: 14473563
num_examples: 2164
download_size: 6741336
dataset_size: 14473563
configs:
- config_name: ar
data_files:
- split: train
path: ar/train-*
- config_name: bn
data_files:
- split: train
path: bn/train-*
- config_name: de
data_files:
- split: train
path: de/train-*
- config_name: en
data_files:
- split: train
path: en/train-*
- config_name: es
data_files:
- split: train
path: es/train-*
- config_name: fa
data_files:
- split: train
path: fa/train-*
- config_name: fi
data_files:
- split: train
path: fi/train-*
- config_name: fr
data_files:
- split: train
path: fr/train-*
- config_name: hi
data_files:
- split: train
path: hi/train-*
- config_name: id
data_files:
- split: train
path: id/train-*
- config_name: ja
data_files:
- split: train
path: ja/train-*
- config_name: ko
data_files:
- split: train
path: ko/train-*
- config_name: ru
data_files:
- split: train
path: ru/train-*
- config_name: sw
data_files:
- split: train
path: sw/train-*
- config_name: te
data_files:
- split: train
path: te/train-*
- config_name: th
data_files:
- split: train
path: th/train-*
- config_name: yo
data_files:
- split: train
path: yo/train-*
- config_name: zh
data_files:
- split: train
path: zh/train-*
---
提供机构:
nthakur
原始信息汇总
数据集概述
数据集配置及特征
| 配置名称 | 特征列表 |
|---|---|
| ar | query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title) |
| bn | query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title) |
| de | query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title) |
| en | query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title) |
| es | query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title) |
| fa | query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title) |
| fi | query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title) |
| fr | query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title) |
| hi | query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title) |
| id | query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title) |
| ja | query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title) |
| ko | query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title) |
| ru | query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title) |
| sw | query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title) |
| te | query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title) |
| th | query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title) |
| yo | query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title) |
| zh | query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title) |
数据集大小及训练集信息
| 配置名称 | 训练集大小 (字节) | 训练集示例数 | 下载大小 (字节) |
|---|---|---|---|
| ar | 7475443 | 458 | 2779767 |
| bn | 22505836 | 990 | 6728454 |
| de | 2903204 | 344 | 1279856 |
| en | 5319916 | 540 | 2255144 |
| es | 4089898 | 490 | 1799533 |
| fa | 17607920 | 1520 | 6223402 |
| fi | 1736496 | 198 | 785685 |
| fr | 10145726 | 1316 | 4381035 |
| hi | 36124504 | 2034 | 10843542 |
| id | 8551467 | 948 | 3584151 |
| ja | 4354109 | 424 | 1852613 |
| ko | 28729361 | 3154 | 12464301 |
| ru | 9021009 | 538 | 3404613 |
| sw | 7885801 | 1016 | 3082673 |
| te | 22055397 | 960 | 5986459 |
| th | 12567842 | 648 | 3797190 |
| yo | 19953708 | 2654 | 6746460 |
| zh | 14473563 | 2164 | 6741336 |
以上信息概述了不同语言配置的数据集特征、大小及训练集的具体信息。



