five

nthakur/nomiracl-raft-instruct

收藏
Hugging Face2024-04-10 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/nthakur/nomiracl-raft-instruct
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: ar features: - name: query_id dtype: string - name: doc_ids sequence: string - name: prompt dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string splits: - name: train num_bytes: 7475443 num_examples: 458 download_size: 2779767 dataset_size: 7475443 - config_name: bn features: - name: query_id dtype: string - name: doc_ids sequence: string - name: prompt dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string splits: - name: train num_bytes: 22505836 num_examples: 990 download_size: 6728454 dataset_size: 22505836 - config_name: de features: - name: query_id dtype: string - name: doc_ids sequence: string - name: prompt dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string splits: - name: train num_bytes: 2903204 num_examples: 344 download_size: 1279856 dataset_size: 2903204 - config_name: en features: - name: query_id dtype: string - name: doc_ids sequence: string - name: prompt dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string splits: - name: train num_bytes: 5319916 num_examples: 540 download_size: 2255144 dataset_size: 5319916 - config_name: es features: - name: query_id dtype: string - name: doc_ids sequence: string - name: prompt dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string splits: - name: train num_bytes: 4089898 num_examples: 490 download_size: 1799533 dataset_size: 4089898 - config_name: fa features: - name: query_id dtype: string - name: doc_ids sequence: string - name: prompt dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string splits: - name: train num_bytes: 17607920 num_examples: 1520 download_size: 6223402 dataset_size: 17607920 - config_name: fi features: - name: query_id dtype: string - name: doc_ids sequence: string - name: prompt dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string splits: - name: train num_bytes: 1736496 num_examples: 198 download_size: 785685 dataset_size: 1736496 - config_name: fr features: - name: query_id dtype: string - name: doc_ids sequence: string - name: prompt dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string splits: - name: train num_bytes: 10145726 num_examples: 1316 download_size: 4381035 dataset_size: 10145726 - config_name: hi features: - name: query_id dtype: string - name: doc_ids sequence: string - name: prompt dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string splits: - name: train num_bytes: 36124504 num_examples: 2034 download_size: 10843542 dataset_size: 36124504 - config_name: id features: - name: query_id dtype: string - name: doc_ids sequence: string - name: prompt dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string splits: - name: train num_bytes: 8551467 num_examples: 948 download_size: 3584151 dataset_size: 8551467 - config_name: ja features: - name: query_id dtype: string - name: doc_ids sequence: string - name: prompt dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string splits: - name: train num_bytes: 4354109 num_examples: 424 download_size: 1852613 dataset_size: 4354109 - config_name: ko features: - name: query_id dtype: string - name: doc_ids sequence: string - name: prompt dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string splits: - name: train num_bytes: 28729361 num_examples: 3154 download_size: 12464301 dataset_size: 28729361 - config_name: ru features: - name: query_id dtype: string - name: doc_ids sequence: string - name: prompt dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string splits: - name: train num_bytes: 9021009 num_examples: 538 download_size: 3404613 dataset_size: 9021009 - config_name: sw features: - name: query_id dtype: string - name: doc_ids sequence: string - name: prompt dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string splits: - name: train num_bytes: 7885801 num_examples: 1016 download_size: 3082673 dataset_size: 7885801 - config_name: te features: - name: query_id dtype: string - name: doc_ids sequence: string - name: prompt dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string splits: - name: train num_bytes: 22055397 num_examples: 960 download_size: 5986459 dataset_size: 22055397 - config_name: th features: - name: query_id dtype: string - name: doc_ids sequence: string - name: prompt dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string splits: - name: train num_bytes: 12567842 num_examples: 648 download_size: 3797190 dataset_size: 12567842 - config_name: yo features: - name: query_id dtype: string - name: doc_ids sequence: string - name: prompt dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string splits: - name: train num_bytes: 19953708 num_examples: 2654 download_size: 6746460 dataset_size: 19953708 - config_name: zh features: - name: query_id dtype: string - name: doc_ids sequence: string - name: prompt dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string splits: - name: train num_bytes: 14473563 num_examples: 2164 download_size: 6741336 dataset_size: 14473563 configs: - config_name: ar data_files: - split: train path: ar/train-* - config_name: bn data_files: - split: train path: bn/train-* - config_name: de data_files: - split: train path: de/train-* - config_name: en data_files: - split: train path: en/train-* - config_name: es data_files: - split: train path: es/train-* - config_name: fa data_files: - split: train path: fa/train-* - config_name: fi data_files: - split: train path: fi/train-* - config_name: fr data_files: - split: train path: fr/train-* - config_name: hi data_files: - split: train path: hi/train-* - config_name: id data_files: - split: train path: id/train-* - config_name: ja data_files: - split: train path: ja/train-* - config_name: ko data_files: - split: train path: ko/train-* - config_name: ru data_files: - split: train path: ru/train-* - config_name: sw data_files: - split: train path: sw/train-* - config_name: te data_files: - split: train path: te/train-* - config_name: th data_files: - split: train path: th/train-* - config_name: yo data_files: - split: train path: yo/train-* - config_name: zh data_files: - split: train path: zh/train-* ---
提供机构:
nthakur
原始信息汇总

数据集概述

数据集配置及特征

配置名称 特征列表
ar query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title)
bn query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title)
de query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title)
en query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title)
es query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title)
fa query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title)
fi query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title)
fr query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title)
hi query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title)
id query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title)
ja query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title)
ko query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title)
ru query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title)
sw query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title)
te query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title)
th query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title)
yo query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title)
zh query_id, doc_ids, prompt, positive_ids, negative_ids, documents (docid, text, title)

数据集大小及训练集信息

配置名称 训练集大小 (字节) 训练集示例数 下载大小 (字节)
ar 7475443 458 2779767
bn 22505836 990 6728454
de 2903204 344 1279856
en 5319916 540 2255144
es 4089898 490 1799533
fa 17607920 1520 6223402
fi 1736496 198 785685
fr 10145726 1316 4381035
hi 36124504 2034 10843542
id 8551467 948 3584151
ja 4354109 424 1852613
ko 28729361 3154 12464301
ru 9021009 538 3404613
sw 7885801 1016 3082673
te 22055397 960 5986459
th 12567842 648 3797190
yo 19953708 2654 6746460
zh 14473563 2164 6741336

以上信息概述了不同语言配置的数据集特征、大小及训练集的具体信息。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作