nthakur/miracl-raft-eval-instruct
收藏Hugging Face2024-04-19 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/nthakur/miracl-raft-eval-instruct
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: ar
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 21119270
num_examples: 2896
- name: dev.small
num_bytes: 729256.5607734807
num_examples: 100
download_size: 10414806
dataset_size: 21848526.56077348
- config_name: bn
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 4350591
num_examples: 411
- name: dev.small
num_bytes: 1058537.9562043797
num_examples: 100
download_size: 1988731
dataset_size: 5409128.95620438
- config_name: de
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev.small
num_bytes: 455798.0327868852
num_examples: 100
- name: dev
num_bytes: 1390184
num_examples: 305
download_size: 1048951
dataset_size: 1845982.0327868853
- config_name: en
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 4106934
num_examples: 799
- name: dev.small
num_bytes: 514009.2615769712
num_examples: 100
download_size: 2561467
dataset_size: 4620943.261576971
- config_name: es
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 2974428
num_examples: 648
- name: dev.small
num_bytes: 459016.6666666667
num_examples: 100
download_size: 1983531
dataset_size: 3433444.6666666665
- config_name: fa
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 3608534
num_examples: 632
- name: dev.small
num_bytes: 570970.5696202532
num_examples: 100
download_size: 1906481
dataset_size: 4179504.569620253
- config_name: fi
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 5362996
num_examples: 1271
- name: dev.small
num_bytes: 421950.90479937056
num_examples: 100
download_size: 3297373
dataset_size: 5784946.90479937
- config_name: fr
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 1438428
num_examples: 343
- name: dev.small
num_bytes: 419366.7638483965
num_examples: 100
download_size: 1045572
dataset_size: 1857794.7638483965
- config_name: hi
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 3122567
num_examples: 350
- name: dev.small
num_bytes: 892162.0
num_examples: 100
download_size: 1503974
dataset_size: 4014729.0
- config_name: id
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 4504281
num_examples: 960
- name: dev.small
num_bytes: 469195.9375
num_examples: 100
download_size: 2674307
dataset_size: 4973476.9375
- config_name: ja
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 4482857
num_examples: 860
- name: dev.small
num_bytes: 521262.4418604651
num_examples: 100
download_size: 2731831
dataset_size: 5004119.441860465
- config_name: ko
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 970749
num_examples: 213
- name: dev.small
num_bytes: 455750.7042253521
num_examples: 100
download_size: 792868
dataset_size: 1426499.704225352
- config_name: ru
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 11085203
num_examples: 1252
- name: dev.small
num_bytes: 885399.6006389776
num_examples: 100
download_size: 5823124
dataset_size: 11970602.600638978
- config_name: sw
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 1797403
num_examples: 482
- name: dev.small
num_bytes: 372905.1867219917
num_examples: 100
download_size: 1232620
dataset_size: 2170308.1867219917
- config_name: te
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 2057912
num_examples: 828
- name: dev.small
num_bytes: 248540.0966183575
num_examples: 100
download_size: 770401
dataset_size: 2306452.0966183576
- config_name: th
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 7233501
num_examples: 733
- name: dev.small
num_bytes: 986835.0613915416
num_examples: 100
download_size: 3043426
dataset_size: 8220336.061391542
- config_name: yo
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev.small
num_bytes: 356994.9579831933
num_examples: 100
- name: dev
num_bytes: 424824
num_examples: 119
download_size: 450789
dataset_size: 781818.9579831932
- config_name: zh
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 1474186
num_examples: 393
- name: dev.small
num_bytes: 375110.94147582696
num_examples: 100
download_size: 1154747
dataset_size: 1849296.941475827
configs:
- config_name: ar
data_files:
- split: dev
path: ar/dev-*
- split: dev.small
path: ar/dev.small-*
- config_name: bn
data_files:
- split: dev
path: bn/dev-*
- split: dev.small
path: bn/dev.small-*
- config_name: de
data_files:
- split: dev.small
path: de/dev.small-*
- split: dev
path: de/dev-*
- config_name: en
data_files:
- split: dev
path: en/dev-*
- split: dev.small
path: en/dev.small-*
- config_name: es
data_files:
- split: dev
path: es/dev-*
- split: dev.small
path: es/dev.small-*
- config_name: fa
data_files:
- split: dev
path: fa/dev-*
- split: dev.small
path: fa/dev.small-*
- config_name: fi
data_files:
- split: dev
path: fi/dev-*
- split: dev.small
path: fi/dev.small-*
- config_name: fr
data_files:
- split: dev
path: fr/dev-*
- split: dev.small
path: fr/dev.small-*
- config_name: hi
data_files:
- split: dev
path: hi/dev-*
- split: dev.small
path: hi/dev.small-*
- config_name: id
data_files:
- split: dev
path: id/dev-*
- split: dev.small
path: id/dev.small-*
- config_name: ja
data_files:
- split: dev
path: ja/dev-*
- split: dev.small
path: ja/dev.small-*
- config_name: ko
data_files:
- split: dev
path: ko/dev-*
- split: dev.small
path: ko/dev.small-*
- config_name: ru
data_files:
- split: dev
path: ru/dev-*
- split: dev.small
path: ru/dev.small-*
- config_name: sw
data_files:
- split: dev
path: sw/dev-*
- split: dev.small
path: sw/dev.small-*
- config_name: te
data_files:
- split: dev
path: te/dev-*
- split: dev.small
path: te/dev.small-*
- config_name: th
data_files:
- split: dev
path: th/dev-*
- split: dev.small
path: th/dev.small-*
- config_name: yo
data_files:
- split: dev.small
path: yo/dev.small-*
- split: dev
path: yo/dev-*
- config_name: zh
data_files:
- split: dev
path: zh/dev-*
- split: dev.small
path: zh/dev.small-*
---
提供机构:
nthakur
原始信息汇总
数据集概述
数据集配置及特征
- 配置名称: 包含多种语言配置,如
ar,bn,de,en等。 - 特征:
- query_id: 数据类型为
string。 - prompt: 数据类型为
string。 - positive_ids: 数据类型为
sequence,具体为string。 - negative_ids: 数据类型为
sequence,具体为string。
- query_id: 数据类型为
数据集分割
- 分割名称: 包括
dev和dev.small两种分割。 - 数据量:
- dev: 不同语言的数据量(以字节为单位)和示例数量各不相同。
- dev.small: 固定为100个示例,数据量(以字节为单位)随语言变化。
数据集大小及下载大小
- 下载大小: 不同语言的下载大小(以字节为单位)各不相同。
- 数据集大小: 不同语言的数据集总大小(以字节为单位)各不相同。
数据文件路径
- 路径: 每种语言的数据文件路径根据分割类型(
dev或dev.small)和语言配置有所不同,路径格式为[语言]/[分割类型]-*。



