nthakur/nomiracl-raft-instruct-mistral
收藏Hugging Face2024-04-11 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/nthakur/nomiracl-raft-instruct-mistral
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: ar
features:
- name: output
list:
- name: model
dtype: string
- name: output
dtype: string
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
splits:
- name: train
num_bytes: 2735079
num_examples: 303
download_size: 1140409
dataset_size: 2735079
- config_name: bn
features:
- name: output
list:
- name: model
dtype: string
- name: output
dtype: string
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
splits:
- name: train
num_bytes: 10136389
num_examples: 821
download_size: 3196811
dataset_size: 10136389
- config_name: en
features:
- name: output
list:
- name: model
dtype: string
- name: output
dtype: string
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
splits:
- name: train
num_bytes: 1520285
num_examples: 255
download_size: 737245
dataset_size: 1520285
- config_name: es
features:
- name: output
list:
- name: model
dtype: string
- name: output
dtype: string
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
splits:
- name: train
num_bytes: 1702736
num_examples: 325
download_size: 800383
dataset_size: 1702736
- config_name: fa
features:
- name: output
list:
- name: model
dtype: string
- name: output
dtype: string
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
splits:
- name: train
num_bytes: 10060118
num_examples: 1397
download_size: 3672376
dataset_size: 10060118
- config_name: fi
features:
- name: output
list:
- name: model
dtype: string
- name: output
dtype: string
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
splits:
- name: train
num_bytes: 613801
num_examples: 112
download_size: 338235
dataset_size: 613801
- config_name: fr
features:
- name: output
list:
- name: model
dtype: string
- name: output
dtype: string
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
splits:
- name: train
num_bytes: 5131163
num_examples: 1005
download_size: 2330304
dataset_size: 5131163
- config_name: hi
features:
- name: output
list:
- name: model
dtype: string
- name: output
dtype: string
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
splits:
- name: train
num_bytes: 19409785
num_examples: 1845
download_size: 6047001
dataset_size: 19409785
- config_name: id
features:
- name: output
list:
- name: model
dtype: string
- name: output
dtype: string
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
splits:
- name: train
num_bytes: 2916188
num_examples: 545
download_size: 1369968
dataset_size: 2916188
- config_name: ja
features:
- name: output
list:
- name: model
dtype: string
- name: output
dtype: string
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
splits:
- name: train
num_bytes: 2253271
num_examples: 367
download_size: 1008667
dataset_size: 2253271
- config_name: ko
features:
- name: output
list:
- name: model
dtype: string
- name: output
dtype: string
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
splits:
- name: train
num_bytes: 8068021
num_examples: 1453
download_size: 3730192
dataset_size: 8068021
- config_name: ru
features:
- name: output
list:
- name: model
dtype: string
- name: output
dtype: string
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
splits:
- name: train
num_bytes: 3433771
num_examples: 361
download_size: 1423396
dataset_size: 3433771
- config_name: sw
features:
- name: output
list:
- name: model
dtype: string
- name: output
dtype: string
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
splits:
- name: train
num_bytes: 2770975
num_examples: 576
download_size: 1278519
dataset_size: 2770975
- config_name: te
features:
- name: output
list:
- name: model
dtype: string
- name: output
dtype: string
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
splits:
- name: train
num_bytes: 1619283
num_examples: 167
download_size: 557845
dataset_size: 1619283
- config_name: th
features:
- name: output
list:
- name: model
dtype: string
- name: output
dtype: string
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
splits:
- name: train
num_bytes: 6090937
num_examples: 543
download_size: 1914954
dataset_size: 6090937
- config_name: zh
features:
- name: output
list:
- name: model
dtype: string
- name: output
dtype: string
- name: prompt
dtype: string
- name: query_id
dtype: string
- name: doc_ids
sequence: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: 'null'
splits:
- name: train
num_bytes: 5992472
num_examples: 1409
download_size: 2950247
dataset_size: 5992472
configs:
- config_name: ar
data_files:
- split: train
path: ar/train-*
- config_name: bn
data_files:
- split: train
path: bn/train-*
- config_name: en
data_files:
- split: train
path: en/train-*
- config_name: es
data_files:
- split: train
path: es/train-*
- config_name: fa
data_files:
- split: train
path: fa/train-*
- config_name: fi
data_files:
- split: train
path: fi/train-*
- config_name: fr
data_files:
- split: train
path: fr/train-*
- config_name: hi
data_files:
- split: train
path: hi/train-*
- config_name: id
data_files:
- split: train
path: id/train-*
- config_name: ja
data_files:
- split: train
path: ja/train-*
- config_name: ko
data_files:
- split: train
path: ko/train-*
- config_name: ru
data_files:
- split: train
path: ru/train-*
- config_name: sw
data_files:
- split: train
path: sw/train-*
- config_name: te
data_files:
- split: train
path: te/train-*
- config_name: th
data_files:
- split: train
path: th/train-*
- config_name: zh
data_files:
- split: train
path: zh/train-*
---
提供机构:
nthakur
原始信息汇总
数据集概述
数据集配置和特征
-
配置名称:
- 包含多种语言配置,如
ar,bn,en,es,fa,fi,fr,hi,id,ja,ko,ru,sw,te,th,zh。
- 包含多种语言配置,如
-
特征:
- 每个配置包含以下特征:
output:包含model和output,数据类型为string。prompt:数据类型为string。query_id:数据类型为string。doc_ids:数据类型为sequence: string。positive_ids:数据类型为sequence: string。negative_ids:数据类型为null。
- 每个配置包含以下特征:
数据集大小和分割
-
训练集:
- 每个配置的训练集大小和示例数量如下:
ar:2735079字节,303个示例。bn:10136389字节,821个示例。en:1520285字节,255个示例。es:1702736字节,325个示例。fa:10060118字节,1397个示例。fi:613801字节,112个示例。fr:5131163字节,1005个示例。hi:19409785字节,1845个示例。id:2916188字节,545个示例。ja:2253271字节,367个示例。ko:8068021字节,1453个示例。ru:3433771字节,361个示例。sw:2770975字节,576个示例。te:1619283字节,167个示例。th:6090937字节,543个示例。zh:5992472字节,1409个示例。
- 每个配置的训练集大小和示例数量如下:
-
下载大小:
- 每个配置的下载大小如下:
ar:1140409字节。bn:3196811字节。en:737245字节。es:800383字节。fa:3672376字节。fi:338235字节。fr:2330304字节。hi:6047001字节。id:1369968字节。ja:1008667字节。ko:3730192字节。ru:1423396字节。sw:1278519字节。te:557845字节。th:1914954字节。zh:2950247字节。
- 每个配置的下载大小如下:
-
数据集总大小:
- 每个配置的数据集总大小与训练集大小相同。



