five

nthakur/miracl-raft-instruct-v0.2

收藏
Hugging Face2024-05-01 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/nthakur/miracl-raft-instruct-v0.2
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: ar features: - name: prompt dtype: string - name: query_id dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: outputs list: - name: model dtype: string - name: output dtype: string splits: - name: train num_bytes: 12924404 num_examples: 3430 download_size: 5353936 dataset_size: 12924404 - config_name: bn features: - name: prompt dtype: string - name: query_id dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: outputs list: - name: model dtype: string - name: output dtype: string splits: - name: train num_bytes: 10401633 num_examples: 1619 download_size: 3489044 dataset_size: 10401633 - config_name: en features: - name: outputs list: - name: model dtype: string - name: output dtype: string - name: prompt dtype: string - name: query_id dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' splits: - name: train num_bytes: 9339231 num_examples: 2848 download_size: 4581070 dataset_size: 9339231 - config_name: es features: - name: prompt dtype: string - name: query_id dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: outputs list: - name: model dtype: string - name: output dtype: string splits: - name: train num_bytes: 9318622 num_examples: 2144 download_size: 4737737 dataset_size: 9318622 - config_name: fa features: - name: prompt dtype: string - name: query_id dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: outputs list: - name: model dtype: string - name: output dtype: string splits: - name: train num_bytes: 8246011 num_examples: 2103 download_size: 3254151 dataset_size: 8246011 - config_name: fi features: - name: prompt dtype: string - name: query_id dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: outputs list: - name: model dtype: string - name: output dtype: string splits: - name: train num_bytes: 6118218 num_examples: 2841 download_size: 2873116 dataset_size: 6118218 - config_name: fr features: - name: prompt dtype: string - name: query_id dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: outputs list: - name: model dtype: string - name: output dtype: string splits: - name: train num_bytes: 3096232 num_examples: 1089 download_size: 1500954 dataset_size: 3096232 - config_name: hi features: - name: prompt dtype: string - name: query_id dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: outputs list: - name: model dtype: string - name: output dtype: string splits: - name: train num_bytes: 6604164 num_examples: 1164 download_size: 0 dataset_size: 6604164 - config_name: id features: - name: prompt dtype: string - name: query_id dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: outputs list: - name: model dtype: string - name: output dtype: string splits: - name: train num_bytes: 13923027 num_examples: 4048 download_size: 6636098 dataset_size: 13923027 - config_name: ja features: - name: prompt dtype: string - name: query_id dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: outputs list: - name: model dtype: string - name: output dtype: string splits: - name: train num_bytes: 10055584 num_examples: 3456 download_size: 4695623 dataset_size: 10055584 - config_name: ko features: - name: prompt dtype: string - name: query_id dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: outputs list: - name: model dtype: string - name: output dtype: string splits: - name: train num_bytes: 2283556 num_examples: 839 download_size: 0 dataset_size: 2283556 - config_name: ru features: - name: prompt dtype: string - name: query_id dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: outputs list: - name: model dtype: string - name: output dtype: string splits: - name: train num_bytes: 19061453 num_examples: 4567 download_size: 8297600 dataset_size: 19061453 - config_name: sw features: - name: outputs list: - name: model dtype: string - name: output dtype: string - name: prompt dtype: string - name: query_id dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' splits: - name: train num_bytes: 2806028 num_examples: 1787 download_size: 0 dataset_size: 2806028 - config_name: te features: - name: outputs list: - name: model dtype: string - name: output dtype: string - name: prompt dtype: string - name: query_id dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' splits: - name: train num_bytes: 10823015 num_examples: 3255 download_size: 3411451 dataset_size: 10823015 - config_name: th features: - name: prompt dtype: string - name: query_id dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: outputs list: - name: model dtype: string - name: output dtype: string splits: - name: train num_bytes: 14520505 num_examples: 2954 download_size: 4809627 dataset_size: 14520505 - config_name: zh features: - name: prompt dtype: string - name: query_id dtype: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' - name: outputs list: - name: model dtype: string - name: output dtype: string splits: - name: train num_bytes: 3395014 num_examples: 1310 download_size: 1780936 dataset_size: 3395014 configs: - config_name: ar data_files: - split: train path: ar/train-* - config_name: bn data_files: - split: train path: bn/train-* - config_name: en data_files: - split: train path: en/train-* - config_name: es data_files: - split: train path: es/train-* - config_name: fa data_files: - split: train path: fa/train-* - config_name: fi data_files: - split: train path: fi/train-* - config_name: fr data_files: - split: train path: fr/train-* - config_name: hi data_files: - split: train path: hi/train-* - config_name: id data_files: - split: train path: id/train-* - config_name: ja data_files: - split: train path: ja/train-* - config_name: ko data_files: - split: train path: ko/train-* - config_name: ru data_files: - split: train path: ru/train-* - config_name: sw data_files: - split: train path: sw/train-* - config_name: te data_files: - split: train path: te/train-* - config_name: th data_files: - split: train path: th/train-* - config_name: zh data_files: - split: train path: zh/train-* --- # Dataset Card for "miracl-raft-instruct-v0.2" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
nthakur
原始信息汇总

数据集概述

本数据集包含多个语言配置,每个配置包含以下特征:

  • prompt: 数据类型为字符串。
  • query_id: 数据类型为字符串。
  • positive_ids: 数据类型为字符串序列。
  • negative_ids: 数据类型为空序列。
  • outputs: 数据类型为列表,包含:
    • model: 数据类型为字符串。
    • output: 数据类型为字符串。

每个语言配置的训练数据集大小、下载大小和训练示例数量如下:

语言配置 训练数据集大小 (字节) 下载大小 (字节) 训练示例数量
ar 12924404 5353936 3430
bn 10401633 3489044 1619
en 9339231 4581070 2848
es 9318622 4737737 2144
fa 8246011 3254151 2103
fi 6118218 2873116 2841
fr 3096232 1500954 1089
hi 6604164 0 1164
id 13923027 6636098 4048
ja 10055584 4695623 3456
ko 2283556 0 839
ru 19061453 8297600 4567
sw 2806028 0 1787
te 10823015 3411451 3255
th 14520505 4809627 2954
zh 3395014 1780936 1310

每个语言配置的训练数据文件路径格式为 <语言>/train-*

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作