nthakur/mirage-eval
收藏Hugging Face2024-06-10 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/nthakur/mirage-eval
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: ar
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 19528139
num_examples: 1501
- name: dev.small
num_bytes: 1301008.5942704864
num_examples: 100
download_size: 40058430
dataset_size: 20829147.594270486
- config_name: bn
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 6425377
num_examples: 411
- name: dev.small
num_bytes: 1563352.0681265206
num_examples: 100
download_size: 11704916
dataset_size: 7988729.06812652
- config_name: de
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 2249942
num_examples: 304
- name: dev.small
num_bytes: 740112.5
num_examples: 100
download_size: 1730375
dataset_size: 2990054.5
- config_name: en
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 6783947
num_examples: 787
- name: dev.small
num_bytes: 862000.8894536213
num_examples: 100
download_size: 4321090
dataset_size: 7645947.889453622
- config_name: es
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 4384991
num_examples: 617
- name: dev.small
num_bytes: 710695.4619124797
num_examples: 100
download_size: 2960094
dataset_size: 5095686.461912479
- config_name: fa
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 6329115
num_examples: 632
- name: dev.small
num_bytes: 1001442.2468354431
num_examples: 100
download_size: 3382168
dataset_size: 7330557.246835443
- config_name: fi
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 8631071
num_examples: 1183
- name: dev.small
num_bytes: 729591.8005071852
num_examples: 100
download_size: 16365685
dataset_size: 9360662.800507186
- config_name: fr
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 2234807
num_examples: 343
- name: dev.small
num_bytes: 651547.2303206997
num_examples: 100
download_size: 1637294
dataset_size: 2886354.2303206995
- config_name: hi
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 5331616
num_examples: 350
- name: dev.small
num_bytes: 1523318.857142857
num_examples: 100
download_size: 2562093
dataset_size: 6854934.857142857
- config_name: id
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 7117754
num_examples: 939
- name: dev.small
num_bytes: 758014.2705005325
num_examples: 100
download_size: 12839552
dataset_size: 7875768.270500532
- config_name: ja
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 7409818
num_examples: 797
- name: dev.small
num_bytes: 929713.6762860728
num_examples: 100
download_size: 13872765
dataset_size: 8339531.6762860725
- config_name: ko
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 2191398
num_examples: 213
- name: dev.small
num_bytes: 1028825.352112676
num_examples: 100
download_size: 5514338
dataset_size: 3220223.352112676
- config_name: ru
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 18918694
num_examples: 1247
- name: dev.small
num_bytes: 1517136.6479550921
num_examples: 100
download_size: 29987695
dataset_size: 20435830.647955094
- config_name: sw
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 3083266
num_examples: 481
- name: dev.small
num_bytes: 641011.6424116424
num_examples: 100
download_size: 6296363
dataset_size: 3724277.6424116422
- config_name: te
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 1074836
num_examples: 150
- name: dev.small
num_bytes: 716557.3333333334
num_examples: 100
download_size: 4231441
dataset_size: 1791393.3333333335
- config_name: th
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 12832048
num_examples: 730
- name: dev.small
num_bytes: 1757814.7945205478
num_examples: 100
download_size: 5432093
dataset_size: 14589862.794520548
- config_name: yo
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 655554
num_examples: 119
- name: dev.small
num_bytes: 550885.7142857143
num_examples: 100
download_size: 668617
dataset_size: 1206439.7142857143
- config_name: zh
features:
- name: query_id
dtype: string
- name: prompt
dtype: string
- name: positive_ids
sequence: string
- name: negative_ids
sequence: string
splits:
- name: dev
num_bytes: 2266547
num_examples: 391
- name: dev.small
num_bytes: 579679.5396419438
num_examples: 100
download_size: 1810423
dataset_size: 2846226.5396419438
configs:
- config_name: ar
data_files:
- split: dev
path: ar/dev-*
- split: dev.small
path: ar/dev.small-*
- config_name: bn
data_files:
- split: dev
path: bn/dev-*
- split: dev.small
path: bn/dev.small-*
- config_name: de
data_files:
- split: dev
path: de/dev-*
- split: dev.small
path: de/dev.small-*
- config_name: en
data_files:
- split: dev
path: en/dev-*
- split: dev.small
path: en/dev.small-*
- config_name: es
data_files:
- split: dev
path: es/dev-*
- split: dev.small
path: es/dev.small-*
- config_name: fa
data_files:
- split: dev
path: fa/dev-*
- split: dev.small
path: fa/dev.small-*
- config_name: fi
data_files:
- split: dev
path: fi/dev-*
- split: dev.small
path: fi/dev.small-*
- config_name: fr
data_files:
- split: dev
path: fr/dev-*
- split: dev.small
path: fr/dev.small-*
- config_name: hi
data_files:
- split: dev
path: hi/dev-*
- split: dev.small
path: hi/dev.small-*
- config_name: id
data_files:
- split: dev
path: id/dev-*
- split: dev.small
path: id/dev.small-*
- config_name: ja
data_files:
- split: dev
path: ja/dev-*
- split: dev.small
path: ja/dev.small-*
- config_name: ko
data_files:
- split: dev
path: ko/dev-*
- split: dev.small
path: ko/dev.small-*
- config_name: ru
data_files:
- split: dev
path: ru/dev-*
- split: dev.small
path: ru/dev.small-*
- config_name: sw
data_files:
- split: dev
path: sw/dev-*
- split: dev.small
path: sw/dev.small-*
- config_name: te
data_files:
- split: dev
path: te/dev-*
- split: dev.small
path: te/dev.small-*
- config_name: th
data_files:
- split: dev
path: th/dev-*
- split: dev.small
path: th/dev.small-*
- config_name: yo
data_files:
- split: dev
path: yo/dev-*
- split: dev.small
path: yo/dev.small-*
- config_name: zh
data_files:
- split: dev
path: zh/dev-*
- split: dev.small
path: zh/dev.small-*
---
提供机构:
nthakur
原始信息汇总
数据集概述
数据集配置
语言配置
- 阿拉伯语 (ar)
- 孟加拉语 (bn)
- 德语 (de)
- 英语 (en)
- 西班牙语 (es)
- 波斯语 (fa)
- 芬兰语 (fi)
- 法语 (fr)
- 印地语 (hi)
- 印度尼西亚语 (id)
- 日语 (ja)
- 韩语 (ko)
- 俄语 (ru)
- 斯瓦希里语 (sw)
- 泰卢固语 (te)
- 泰语 (th)
- 约鲁巴语 (yo)
- 中文 (zh)
特征
- query_id: 字符串类型
- prompt: 字符串类型
- positive_ids: 字符串序列类型
- negative_ids: 字符串序列类型
数据分割
- dev: 开发集
- dev.small: 小规模开发集
数据集大小
阿拉伯语 (ar)
- 下载大小: 40058430 字节
- 数据集大小: 20829147.594270486 字节
- dev 分割: 19528139 字节, 1501 个样本
- dev.small 分割: 1301008.5942704864 字节, 100 个样本
孟加拉语 (bn)
- 下载大小: 11704916 字节
- 数据集大小: 7988729.06812652 字节
- dev 分割: 6425377 字节, 411 个样本
- dev.small 分割: 1563352.0681265206 字节, 100 个样本
德语 (de)
- 下载大小: 1730375 字节
- 数据集大小: 2990054.5 字节
- dev 分割: 2249942 字节, 304 个样本
- dev.small 分割: 740112.5 字节, 100 个样本
英语 (en)
- 下载大小: 4321090 字节
- 数据集大小: 7645947.889453622 字节
- dev 分割: 6783947 字节, 787 个样本
- dev.small 分割: 862000.8894536213 字节, 100 个样本
西班牙语 (es)
- 下载大小: 2960094 字节
- 数据集大小: 5095686.461912479 字节
- dev 分割: 4384991 字节, 617 个样本
- dev.small 分割: 710695.4619124797 字节, 100 个样本
波斯语 (fa)
- 下载大小: 3382168 字节
- 数据集大小: 7330557.246835443 字节
- dev 分割: 6329115 字节, 632 个样本
- dev.small 分割: 1001442.2468354431 字节, 100 个样本
芬兰语 (fi)
- 下载大小: 16365685 字节
- 数据集大小: 9360662.800507186 字节
- dev 分割: 8631071 字节, 1183 个样本
- dev.small 分割: 729591.8005071852 字节, 100 个样本
法语 (fr)
- 下载大小: 1637294 字节
- 数据集大小: 2886354.2303206995 字节
- dev 分割: 2234807 字节, 343 个样本
- dev.small 分割: 651547.2303206997 字节, 100 个样本
印地语 (hi)
- 下载大小: 2562093 字节
- 数据集大小: 6854934.857142857 字节
- dev 分割: 5331616 字节, 350 个样本
- dev.small 分割: 1523318.857142857 字节, 100 个样本
印度尼西亚语 (id)
- 下载大小: 12839552 字节
- 数据集大小: 7875768.270500532 字节
- dev 分割: 7117754 字节, 939 个样本
- dev.small 分割: 758014.2705005325 字节, 100 个样本
日语 (ja)
- 下载大小: 13872765 字节
- 数据集大小: 8339531.6762860725 字节
- dev 分割: 7409818 字节, 797 个样本
- dev.small 分割: 929713.6762860728 字节, 100 个样本
韩语 (ko)
- 下载大小: 5514338 字节
- 数据集大小: 3220223.352112676 字节
- dev 分割: 2191398 字节, 213 个样本
- dev.small 分割: 1028825.352112676 字节, 100 个样本
俄语 (ru)
- 下载大小: 29987695 字节
- 数据集大小: 20435830.647955094 字节
- dev 分割: 18918694 字节, 1247 个样本
- dev.small 分割: 1517136.6479550921 字节, 100 个样本
斯瓦希里语 (sw)
- 下载大小: 6296363 字节
- 数据集大小: 3724277.6424116422 字节
- dev 分割: 3083266 字节, 481 个样本
- dev.small 分割: 641011.6424116424 字节, 100 个样本
泰卢固语 (te)
- 下载大小: 4231441 字节
- 数据集大小: 1791393.3333333335 字节
- dev 分割: 1074836 字节, 150 个样本
- dev.small 分割: 716557.3333333334 字节, 100 个样本
泰语 (th)
- 下载大小: 5432093 字节
- 数据集大小: 14589862.794520548 字节
- dev 分割: 12832048 字节, 730 个样本
- dev.small 分割: 1757814.7945205478 字节, 100 个样本
约鲁巴语 (yo)
- 下载大小: 668617 字节
- 数据集大小: 1206439.7142857143 字节
- dev 分割: 655554 字节, 119 个样本
- dev.small 分割: 550885.7142857143 字节, 100 个样本
中文 (zh)
- 下载大小: 1810423 字节
- 数据集大小: 2846226.5396419438 字节
- dev 分割: 2266547 字节, 391 个样本
- dev.small 分割: 579679.5396419438 字节, 100 个样本



