nthakur/miracl-raft-instruct-v0.2
收藏Hugging Face2024-05-01 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/nthakur/miracl-raft-instruct-v0.2
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: ar
features:
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: outputs
list:
- name: model
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 12924404
num_examples: 3430
download_size: 5353936
dataset_size: 12924404
- config_name: bn
features:
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: outputs
list:
- name: model
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 10401633
num_examples: 1619
download_size: 3489044
dataset_size: 10401633
- config_name: en
features:
- name: outputs
list:
- name: model
dtype: string
- name: output
dtype: string
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
splits:
- name: train
num_bytes: 9339231
num_examples: 2848
download_size: 4581070
dataset_size: 9339231
- config_name: es
features:
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: outputs
list:
- name: model
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 9318622
num_examples: 2144
download_size: 4737737
dataset_size: 9318622
- config_name: fa
features:
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: outputs
list:
- name: model
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 8246011
num_examples: 2103
download_size: 3254151
dataset_size: 8246011
- config_name: fi
features:
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: outputs
list:
- name: model
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 6118218
num_examples: 2841
download_size: 2873116
dataset_size: 6118218
- config_name: fr
features:
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: outputs
list:
- name: model
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 3096232
num_examples: 1089
download_size: 1500954
dataset_size: 3096232
- config_name: hi
features:
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: outputs
list:
- name: model
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 6604164
num_examples: 1164
download_size: 0
dataset_size: 6604164
- config_name: id
features:
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: outputs
list:
- name: model
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 13923027
num_examples: 4048
download_size: 6636098
dataset_size: 13923027
- config_name: ja
features:
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: outputs
list:
- name: model
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 10055584
num_examples: 3456
download_size: 4695623
dataset_size: 10055584
- config_name: ko
features:
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: outputs
list:
- name: model
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 2283556
num_examples: 839
download_size: 0
dataset_size: 2283556
- config_name: ru
features:
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: outputs
list:
- name: model
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 19061453
num_examples: 4567
download_size: 8297600
dataset_size: 19061453
- config_name: sw
features:
- name: outputs
list:
- name: model
dtype: string
- name: output
dtype: string
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
splits:
- name: train
num_bytes: 2806028
num_examples: 1787
download_size: 0
dataset_size: 2806028
- config_name: te
features:
- name: outputs
list:
- name: model
dtype: string
- name: output
dtype: string
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
splits:
- name: train
num_bytes: 10823015
num_examples: 3255
download_size: 3411451
dataset_size: 10823015
- config_name: th
features:
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: outputs
list:
- name: model
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 14520505
num_examples: 2954
download_size: 4809627
dataset_size: 14520505
- config_name: zh
features:
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
- name: outputs
list:
- name: model
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 3395014
num_examples: 1310
download_size: 1780936
dataset_size: 3395014
configs:
- config_name: ar
data_files:
- split: train
path: ar/train-*
- config_name: bn
data_files:
- split: train
path: bn/train-*
- config_name: en
data_files:
- split: train
path: en/train-*
- config_name: es
data_files:
- split: train
path: es/train-*
- config_name: fa
data_files:
- split: train
path: fa/train-*
- config_name: fi
data_files:
- split: train
path: fi/train-*
- config_name: fr
data_files:
- split: train
path: fr/train-*
- config_name: hi
data_files:
- split: train
path: hi/train-*
- config_name: id
data_files:
- split: train
path: id/train-*
- config_name: ja
data_files:
- split: train
path: ja/train-*
- config_name: ko
data_files:
- split: train
path: ko/train-*
- config_name: ru
data_files:
- split: train
path: ru/train-*
- config_name: sw
data_files:
- split: train
path: sw/train-*
- config_name: te
data_files:
- split: train
path: te/train-*
- config_name: th
data_files:
- split: train
path: th/train-*
- config_name: zh
data_files:
- split: train
path: zh/train-*
---
# Dataset Card for "miracl-raft-instruct-v0.2"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
nthakur
原始信息汇总
数据集概述
本数据集包含多个语言配置,每个配置包含以下特征:
- prompt: 数据类型为字符串。
- query_id: 数据类型为字符串。
- positive_ids: 数据类型为字符串序列。
- negative_ids: 数据类型为空序列。
- outputs: 数据类型为列表,包含:
- model: 数据类型为字符串。
- output: 数据类型为字符串。
每个语言配置的训练数据集大小、下载大小和训练示例数量如下:
| 语言配置 | 训练数据集大小 (字节) | 下载大小 (字节) | 训练示例数量 |
|---|---|---|---|
| ar | 12924404 | 5353936 | 3430 |
| bn | 10401633 | 3489044 | 1619 |
| en | 9339231 | 4581070 | 2848 |
| es | 9318622 | 4737737 | 2144 |
| fa | 8246011 | 3254151 | 2103 |
| fi | 6118218 | 2873116 | 2841 |
| fr | 3096232 | 1500954 | 1089 |
| hi | 6604164 | 0 | 1164 |
| id | 13923027 | 6636098 | 4048 |
| ja | 10055584 | 4695623 | 3456 |
| ko | 2283556 | 0 | 839 |
| ru | 19061453 | 8297600 | 4567 |
| sw | 2806028 | 0 | 1787 |
| te | 10823015 | 3411451 | 3255 |
| th | 14520505 | 4809627 | 2954 |
| zh | 3395014 | 1780936 | 1310 |
每个语言配置的训练数据文件路径格式为 <语言>/train-*。



