unlearning-cleanslate/generations-nemotron-nano-9b-v2-simnpo-gentle-baseline
收藏Hugging Face2026-04-30 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/unlearning-cleanslate/generations-nemotron-nano-9b-v2-simnpo-gentle-baseline
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: arc_challenge
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answerKey
dtype: string
- name: choices
struct:
- name: label
list: string
- name: text
list: string
- name: id
dtype: string
- name: question
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_4
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1903440
num_examples: 1172
download_size: 1729201
dataset_size: 1903440
- config_name: bbh_cot_fewshot_boolean_expressions
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 758630
num_examples: 250
download_size: 745986
dataset_size: 758630
- config_name: bbh_cot_fewshot_causal_judgement
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1628813
num_examples: 187
download_size: 1619421
dataset_size: 1628813
- config_name: bbh_cot_fewshot_date_understanding
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1478614
num_examples: 250
download_size: 1465819
dataset_size: 1478614
- config_name: bbh_cot_fewshot_disambiguation_qa
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 2016799
num_examples: 250
download_size: 2014383
dataset_size: 2016799
- config_name: bbh_cot_fewshot_dyck_languages
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1555151
num_examples: 250
download_size: 1550864
dataset_size: 1555151
- config_name: bbh_cot_fewshot_formal_fallacies
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 2305611
num_examples: 250
download_size: 2288073
dataset_size: 2305611
- config_name: bbh_cot_fewshot_geometric_shapes
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 2016187
num_examples: 250
download_size: 1997455
dataset_size: 2016187
- config_name: bbh_cot_fewshot_hyperbaton
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1757770
num_examples: 250
download_size: 1751912
dataset_size: 1757770
- config_name: bbh_cot_fewshot_logical_deduction_five_objects
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1816540
num_examples: 250
download_size: 1811444
dataset_size: 1816540
- config_name: bbh_cot_fewshot_logical_deduction_seven_objects
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1611414
num_examples: 250
download_size: 1609227
dataset_size: 1611414
- config_name: bbh_cot_fewshot_logical_deduction_three_objects
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1678586
num_examples: 250
download_size: 1672977
dataset_size: 1678586
- config_name: bbh_cot_fewshot_movie_recommendation
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1647888
num_examples: 250
download_size: 1637866
dataset_size: 1647888
- config_name: bbh_cot_fewshot_multistep_arithmetic_two
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1149526
num_examples: 250
download_size: 1152397
dataset_size: 1149526
- config_name: bbh_cot_fewshot_navigate
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1589118
num_examples: 250
download_size: 1580001
dataset_size: 1589118
- config_name: bbh_cot_fewshot_object_counting
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 626198
num_examples: 250
download_size: 614272
dataset_size: 626198
- config_name: bbh_cot_fewshot_penguins_in_a_table
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1001308
num_examples: 146
download_size: 1007681
dataset_size: 1001308
- config_name: bbh_cot_fewshot_reasoning_about_colored_objects
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1298716
num_examples: 250
download_size: 1292237
dataset_size: 1298716
- config_name: bbh_cot_fewshot_ruin_names
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1559817
num_examples: 250
download_size: 1556106
dataset_size: 1559817
- config_name: bbh_cot_fewshot_salient_translation_error_detection
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 2435532
num_examples: 250
download_size: 2420345
dataset_size: 2435532
- config_name: bbh_cot_fewshot_snarks
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1319329
num_examples: 178
download_size: 1322633
dataset_size: 1319329
- config_name: bbh_cot_fewshot_sports_understanding
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 446702
num_examples: 250
download_size: 431502
dataset_size: 446702
- config_name: bbh_cot_fewshot_temporal_sequences
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1410435
num_examples: 250
download_size: 1407144
dataset_size: 1410435
- config_name: bbh_cot_fewshot_tracking_shuffled_objects_five_objects
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1312290
num_examples: 250
download_size: 1308436
dataset_size: 1312290
- config_name: bbh_cot_fewshot_tracking_shuffled_objects_seven_objects
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1583197
num_examples: 250
download_size: 1580101
dataset_size: 1583197
- config_name: bbh_cot_fewshot_tracking_shuffled_objects_three_objects
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1648563
num_examples: 250
download_size: 1643269
dataset_size: 1648563
- config_name: bbh_cot_fewshot_web_of_lies
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1187683
num_examples: 250
download_size: 1182427
dataset_size: 1187683
- config_name: bbh_cot_fewshot_word_sorting
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: input
dtype: string
- name: target
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1399397
num_examples: 250
download_size: 1405145
dataset_size: 1399397
- config_name: cleanslate_qa
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: string
- name: content_id
dtype: string
- name: content_title
dtype: string
- name: question
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 8344844
num_examples: 12088
download_size: 7454115
dataset_size: 8344844
- config_name: coqa
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: additional_answers
struct:
- name: '0'
struct:
- name: input_text
list: string
- name: span_end
list: int64
- name: span_start
list: int64
- name: span_text
list: string
- name: turn_id
list: int64
- name: '1'
struct:
- name: input_text
list: string
- name: span_end
list: int64
- name: span_start
list: int64
- name: span_text
list: string
- name: turn_id
list: int64
- name: '2'
struct:
- name: input_text
list: string
- name: span_end
list: int64
- name: span_start
list: int64
- name: span_text
list: string
- name: turn_id
list: int64
- name: answers
struct:
- name: input_text
list: string
- name: span_end
list: int64
- name: span_start
list: int64
- name: span_text
list: string
- name: turn_id
list: int64
- name: id
dtype: string
- name: questions
struct:
- name: input_text
list: string
- name: turn_id
list: int64
- name: source
dtype: string
- name: story
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: float64
- name: score
dtype: float64
splits:
- name: train
num_bytes: 5862382
num_examples: 500
download_size: 5871038
dataset_size: 5862382
- config_name: drop
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
struct:
- name: date
struct:
- name: day
dtype: string
- name: month
dtype: string
- name: year
dtype: string
- name: hit_id
dtype: string
- name: number
dtype: string
- name: spans
list: string
- name: worker_id
dtype: string
- name: answers
list:
list: string
- name: id
dtype: string
- name: passage
dtype: string
- name: query_id
dtype: string
- name: question
dtype: string
- name: section_id
dtype: string
- name: validated_answers
struct:
- name: date
list:
- name: day
dtype: string
- name: month
dtype: string
- name: year
dtype: string
- name: hit_id
list: string
- name: number
list: string
- name: spans
list:
list: string
- name: worker_id
list: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 30383384
num_examples: 9536
download_size: 28640383
dataset_size: 30383384
- config_name: gsm8k
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: string
- name: question
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 12372995
num_examples: 2638
download_size: 11437698
dataset_size: 12372995
- config_name: hellaswag
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: activity_label
dtype: string
- name: choices
list: string
- name: ctx
dtype: string
- name: ctx_a
dtype: string
- name: ctx_b
dtype: string
- name: endings
list: string
- name: gold
dtype: int64
- name: ind
dtype: int64
- name: label
dtype: string
- name: query
dtype: string
- name: source_id
dtype: string
- name: split
dtype: string
- name: split_type
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 39603139
num_examples: 10042
download_size: 38113927
dataset_size: 39603139
- config_name: humaneval_plus
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: canonical_solution
dtype: string
- name: entry_point
dtype: string
- name: prompt
dtype: string
- name: task_id
dtype: string
- name: test
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: max_gen_toks
dtype: int64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: 'null'
- name: score
dtype: float64
splits:
- name: train
num_bytes: 22497326
num_examples: 164
download_size: 14489866
dataset_size: 22497326
- config_name: lambada_openai
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: text
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 5115477
num_examples: 5153
download_size: 4753497
dataset_size: 5115477
- config_name: mmlu_abstract_algebra
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 187342
num_examples: 100
download_size: 189057
dataset_size: 187342
- config_name: mmlu_anatomy
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 281987
num_examples: 135
download_size: 280149
dataset_size: 281987
- config_name: mmlu_astronomy
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 366447
num_examples: 152
download_size: 363581
dataset_size: 366447
- config_name: mmlu_business_ethics
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 256422
num_examples: 100
download_size: 259792
dataset_size: 256422
- config_name: mmlu_clinical_knowledge
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 556689
num_examples: 265
download_size: 534952
dataset_size: 556689
- config_name: mmlu_college_biology
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 373815
num_examples: 144
download_size: 370714
dataset_size: 373815
- config_name: mmlu_college_chemistry
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 214822
num_examples: 100
download_size: 219143
dataset_size: 214822
- config_name: mmlu_college_computer_science
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 307869
num_examples: 100
download_size: 316697
dataset_size: 307869
- config_name: mmlu_college_mathematics
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 215809
num_examples: 100
download_size: 217170
dataset_size: 215809
- config_name: mmlu_college_medicine
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 568874
num_examples: 173
download_size: 564336
dataset_size: 568874
- config_name: mmlu_college_physics
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 242897
num_examples: 102
download_size: 245919
dataset_size: 242897
- config_name: mmlu_computer_security
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 226840
num_examples: 100
download_size: 228393
dataset_size: 226840
- config_name: mmlu_conceptual_physics
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 419038
num_examples: 235
download_size: 400039
dataset_size: 419038
- config_name: mmlu_econometrics
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 333825
num_examples: 114
download_size: 335129
dataset_size: 333825
- config_name: mmlu_electrical_engineering
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 261575
num_examples: 145
download_size: 255624
dataset_size: 261575
- config_name: mmlu_elementary_mathematics
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 704818
num_examples: 378
download_size: 661476
dataset_size: 704818
- config_name: mmlu_formal_logic
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 360391
num_examples: 126
download_size: 361544
dataset_size: 360391
- config_name: mmlu_global_facts
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 180735
num_examples: 100
download_size: 181939
dataset_size: 180735
- config_name: mmlu_high_school_biology
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 834390
num_examples: 310
download_size: 806139
dataset_size: 834390
- config_name: mmlu_high_school_chemistry
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 481500
num_examples: 203
download_size: 467780
dataset_size: 481500
- config_name: mmlu_high_school_computer_science
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 319044
num_examples: 100
download_size: 325381
dataset_size: 319044
- config_name: mmlu_high_school_european_history
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1510917
num_examples: 165
download_size: 1524553
dataset_size: 1510917
- config_name: mmlu_high_school_geography
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 394704
num_examples: 198
download_size: 381318
dataset_size: 394704
- config_name: mmlu_high_school_government_and_politics
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 523758
num_examples: 193
download_size: 513333
dataset_size: 523758
- config_name: mmlu_high_school_macroeconomics
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 961761
num_examples: 390
download_size: 921567
dataset_size: 961761
- config_name: mmlu_high_school_mathematics
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 528744
num_examples: 270
download_size: 507246
dataset_size: 528744
- config_name: mmlu_high_school_microeconomics
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 606255
num_examples: 238
download_size: 588706
dataset_size: 606255
- config_name: mmlu_high_school_physics
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 436906
num_examples: 151
download_size: 439517
dataset_size: 436906
- config_name: mmlu_high_school_psychology
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1307792
num_examples: 545
download_size: 1240638
dataset_size: 1307792
- config_name: mmlu_high_school_statistics
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 755926
num_examples: 216
download_size: 743261
dataset_size: 755926
- config_name: mmlu_high_school_us_history
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1674732
num_examples: 204
download_size: 1685879
dataset_size: 1674732
- config_name: mmlu_high_school_world_history
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 2118539
num_examples: 237
download_size: 2123432
dataset_size: 2118539
- config_name: mmlu_human_aging
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 427272
num_examples: 223
download_size: 410163
dataset_size: 427272
- config_name: mmlu_human_sexuality
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 278792
num_examples: 131
download_size: 275949
dataset_size: 278792
- config_name: mmlu_international_law
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 378013
num_examples: 121
download_size: 383664
dataset_size: 378013
- config_name: mmlu_jurisprudence
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 266226
num_examples: 108
download_size: 266102
dataset_size: 266226
- config_name: mmlu_logical_fallacies
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 399293
num_examples: 163
download_size: 394013
dataset_size: 399293
- config_name: mmlu_machine_learning
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 270894
num_examples: 112
download_size: 270736
dataset_size: 270894
- config_name: mmlu_management
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 190303
num_examples: 103
download_size: 191053
dataset_size: 190303
- config_name: mmlu_marketing
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 518661
num_examples: 234
download_size: 502007
dataset_size: 518661
- config_name: mmlu_medical_genetics
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 194944
num_examples: 100
download_size: 198468
dataset_size: 194944
- config_name: mmlu_miscellaneous
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1437464
num_examples: 783
download_size: 1339800
dataset_size: 1437464
- config_name: mmlu_moral_disputes
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 848044
num_examples: 346
download_size: 813146
dataset_size: 848044
- config_name: mmlu_moral_scenarios
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 2677722
num_examples: 895
download_size: 2561951
dataset_size: 2677722
- config_name: mmlu_nutrition
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 727718
num_examples: 306
download_size: 702483
dataset_size: 727718
- config_name: mmlu_philosophy
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 673190
num_examples: 311
download_size: 645801
dataset_size: 673190
- config_name: mmlu_prehistory
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 732288
num_examples: 324
download_size: 702350
dataset_size: 732288
- config_name: mmlu_professional_accounting
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 888598
num_examples: 282
download_size: 865231
dataset_size: 888598
- config_name: mmlu_professional_law
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 10851010
num_examples: 1534
download_size: 10726323
dataset_size: 10851010
- config_name: mmlu_professional_medicine
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1341309
num_examples: 272
download_size: 1336718
dataset_size: 1341309
- config_name: mmlu_professional_psychology
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1706303
num_examples: 612
download_size: 1632698
dataset_size: 1706303
- config_name: mmlu_public_relations
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 243566
num_examples: 110
download_size: 244568
dataset_size: 243566
- config_name: mmlu_security_studies
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 1246441
num_examples: 245
download_size: 1235175
dataset_size: 1246441
- config_name: mmlu_sociology
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 506483
num_examples: 201
download_size: 493806
dataset_size: 506483
- config_name: mmlu_us_foreign_policy
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 233331
num_examples: 100
download_size: 234944
dataset_size: 233331
- config_name: mmlu_virology
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 337727
num_examples: 166
download_size: 331520
dataset_size: 337727
- config_name: mmlu_world_religions
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: int64
- name: choices
list: string
- name: question
dtype: string
- name: subject
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_2
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_3
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 280344
num_examples: 171
download_size: 270570
dataset_size: 280344
- config_name: triviaqa
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
struct:
- name: aliases
list: string
- name: matched_wiki_entity_name
dtype: string
- name: normalized_aliases
list: string
- name: normalized_matched_wiki_entity_name
dtype: string
- name: normalized_value
dtype: string
- name: type
dtype: string
- name: value
dtype: string
- name: entity_pages
struct:
- name: doc_source
list: 'null'
- name: filename
list: 'null'
- name: title
list: 'null'
- name: wiki_context
list: 'null'
- name: question
dtype: string
- name: question_id
dtype: string
- name: question_source
dtype: string
- name: search_results
struct:
- name: description
list: 'null'
- name: filename
list: 'null'
- name: rank
list: 'null'
- name: search_context
list: 'null'
- name: title
list: 'null'
- name: url
list: 'null'
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
struct:
- name: do_sample
dtype: bool
- name: temperature
dtype: float64
- name: until
list: string
- name: resps
list:
list: string
- name: filtered_resps
list: string
- name: filter
dtype: string
- name: metrics
list: string
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: bypass
dtype: float64
- name: score
dtype: float64
splits:
- name: train
num_bytes: 28366603
num_examples: 17944
download_size: 21644041
dataset_size: 28366603
- config_name: winogrande
features:
- name: doc_id
dtype: int64
- name: doc
struct:
- name: answer
dtype: string
- name: option1
dtype: string
- name: option2
dtype: string
- name: sentence
dtype: string
- name: target
dtype: string
- name: arguments
struct:
- name: gen_args_0
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: gen_args_1
struct:
- name: arg_0
dtype: string
- name: arg_1
dtype: string
- name: resps
list:
list:
list: string
- name: filtered_resps
list:
list: string
- name: filter
dtype: string
- name: metrics
list: 'null'
- name: doc_hash
dtype: string
- name: prompt_hash
dtype: string
- name: target_hash
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 981859
num_examples: 1267
download_size: 884773
dataset_size: 981859
configs:
- config_name: arc_challenge
data_files:
- split: train
path: arc_challenge/train-*
- config_name: bbh_cot_fewshot_boolean_expressions
data_files:
- split: train
path: bbh_cot_fewshot_boolean_expressions/train-*
- config_name: bbh_cot_fewshot_causal_judgement
data_files:
- split: train
path: bbh_cot_fewshot_causal_judgement/train-*
- config_name: bbh_cot_fewshot_date_understanding
data_files:
- split: train
path: bbh_cot_fewshot_date_understanding/train-*
- config_name: bbh_cot_fewshot_disambiguation_qa
data_files:
- split: train
path: bbh_cot_fewshot_disambiguation_qa/train-*
- config_name: bbh_cot_fewshot_dyck_languages
data_files:
- split: train
path: bbh_cot_fewshot_dyck_languages/train-*
- config_name: bbh_cot_fewshot_formal_fallacies
data_files:
- split: train
path: bbh_cot_fewshot_formal_fallacies/train-*
- config_name: bbh_cot_fewshot_geometric_shapes
data_files:
- split: train
path: bbh_cot_fewshot_geometric_shapes/train-*
- config_name: bbh_cot_fewshot_hyperbaton
data_files:
- split: train
path: bbh_cot_fewshot_hyperbaton/train-*
- config_name: bbh_cot_fewshot_logical_deduction_five_objects
data_files:
- split: train
path: bbh_cot_fewshot_logical_deduction_five_objects/train-*
- config_name: bbh_cot_fewshot_logical_deduction_seven_objects
data_files:
- split: train
path: bbh_cot_fewshot_logical_deduction_seven_objects/train-*
- config_name: bbh_cot_fewshot_logical_deduction_three_objects
data_files:
- split: train
path: bbh_cot_fewshot_logical_deduction_three_objects/train-*
- config_name: bbh_cot_fewshot_movie_recommendation
data_files:
- split: train
path: bbh_cot_fewshot_movie_recommendation/train-*
- config_name: bbh_cot_fewshot_multistep_arithmetic_two
data_files:
- split: train
path: bbh_cot_fewshot_multistep_arithmetic_two/train-*
- config_name: bbh_cot_fewshot_navigate
data_files:
- split: train
path: bbh_cot_fewshot_navigate/train-*
- config_name: bbh_cot_fewshot_object_counting
data_files:
- split: train
path: bbh_cot_fewshot_object_counting/train-*
- config_name: bbh_cot_fewshot_penguins_in_a_table
data_files:
- split: train
path: bbh_cot_fewshot_penguins_in_a_table/train-*
- config_name: bbh_cot_fewshot_reasoning_about_colored_objects
data_files:
- split: train
path: bbh_cot_fewshot_reasoning_about_colored_objects/train-*
- config_name: bbh_cot_fewshot_ruin_names
data_files:
- split: train
path: bbh_cot_fewshot_ruin_names/train-*
- config_name: bbh_cot_fewshot_salient_translation_error_detection
data_files:
- split: train
path: bbh_cot_fewshot_salient_translation_error_detection/train-*
- config_name: bbh_cot_fewshot_snarks
data_files:
- split: train
path: bbh_cot_fewshot_snarks/train-*
- config_name: bbh_cot_fewshot_sports_understanding
data_files:
- split: train
path: bbh_cot_fewshot_sports_understanding/train-*
- config_name: bbh_cot_fewshot_temporal_sequences
data_files:
- split: train
path: bbh_cot_fewshot_temporal_sequences/train-*
- config_name: bbh_cot_fewshot_tracking_shuffled_objects_five_objects
data_files:
- split: train
path: bbh_cot_fewshot_tracking_shuffled_objects_five_objects/train-*
- config_name: bbh_cot_fewshot_tracking_shuffled_objects_seven_objects
data_files:
- split: train
path: bbh_cot_fewshot_tracking_shuffled_objects_seven_objects/train-*
- config_name: bbh_cot_fewshot_tracking_shuffled_objects_three_objects
data_files:
- split: train
path: bbh_cot_fewshot_tracking_shuffled_objects_three_objects/train-*
- config_name: bbh_cot_fewshot_web_of_lies
data_files:
- split: train
path: bbh_cot_fewshot_web_of_lies/train-*
- config_name: bbh_cot_fewshot_word_sorting
data_files:
- split: train
path: bbh_cot_fewshot_word_sorting/train-*
- config_name: cleanslate_qa
data_files:
- split: train
path: cleanslate_qa/train-*
- config_name: coqa
data_files:
- split: train
path: coqa/train-*
- config_name: drop
data_files:
- split: train
path: drop/train-*
- config_name: gsm8k
data_files:
- split: train
path: gsm8k/train-*
- config_name: hellaswag
data_files:
- split: train
path: hellaswag/train-*
- config_name: humaneval_plus
data_files:
- split: train
path: humaneval_plus/train-*
- config_name: lambada_openai
data_files:
- split: train
path: lambada_openai/train-*
- config_name: mmlu_abstract_algebra
data_files:
- split: train
path: mmlu_abstract_algebra/train-*
- config_name: mmlu_anatomy
data_files:
- split: train
path: mmlu_anatomy/train-*
- config_name: mmlu_astronomy
data_files:
- split: train
path: mmlu_astronomy/train-*
- config_name: mmlu_business_ethics
data_files:
- split: train
path: mmlu_business_ethics/train-*
- config_name: mmlu_clinical_knowledge
data_files:
- split: train
path: mmlu_clinical_knowledge/train-*
- config_name: mmlu_college_biology
data_files:
- split: train
path: mmlu_college_biology/train-*
- config_name: mmlu_college_chemistry
data_files:
- split: train
path: mmlu_college_chemistry/train-*
- config_name: mmlu_college_computer_science
data_files:
- split: train
path: mmlu_college_computer_science/train-*
- config_name: mmlu_college_mathematics
data_files:
- split: train
path: mmlu_college_mathematics/train-*
- config_name: mmlu_college_medicine
data_files:
- split: train
path: mmlu_college_medicine/train-*
- config_name: mmlu_college_physics
data_files:
- split: train
path: mmlu_college_physics/train-*
- config_name: mmlu_computer_security
data_files:
- split: train
path: mmlu_computer_security/train-*
- config_name: mmlu_conceptual_physics
data_files:
- split: train
path: mmlu_conceptual_physics/train-*
- config_name: mmlu_econometrics
data_files:
- split: train
path: mmlu_econometrics/train-*
- config_name: mmlu_electrical_engineering
data_files:
- split: train
path: mmlu_electrical_engineering/train-*
- config_name: mmlu_elementary_mathematics
data_files:
- split: train
path: mmlu_elementary_mathematics/train-*
- config_name: mmlu_formal_logic
data_files:
- split: train
path: mmlu_formal_logic/train-*
- config_name: mmlu_global_facts
data_files:
- split: train
path: mmlu_global_facts/train-*
- config_name: mmlu_high_school_biology
data_files:
- split: train
path: mmlu_high_school_biology/train-*
- config_name: mmlu_high_school_chemistry
data_files:
- split: train
path: mmlu_high_school_chemistry/train-*
- config_name: mmlu_high_school_computer_science
data_files:
- split: train
path: mmlu_high_school_computer_science/train-*
- config_name: mmlu_high_school_european_history
data_files:
- split: train
path: mmlu_high_school_european_history/train-*
- config_name: mmlu_high_school_geography
data_files:
- split: train
path: mmlu_high_school_geography/train-*
- config_name: mmlu_high_school_government_and_politics
data_files:
- split: train
path: mmlu_high_school_government_and_politics/train-*
- config_name: mmlu_high_school_macroeconomics
data_files:
- split: train
path: mmlu_high_school_macroeconomics/train-*
- config_name: mmlu_high_school_mathematics
data_files:
- split: train
path: mmlu_high_school_mathematics/train-*
- config_name: mmlu_high_school_microeconomics
data_files:
- split: train
path: mmlu_high_school_microeconomics/train-*
- config_name: mmlu_high_school_physics
data_files:
- split: train
path: mmlu_high_school_physics/train-*
- config_name: mmlu_high_school_psychology
data_files:
- split: train
path: mmlu_high_school_psychology/train-*
- config_name: mmlu_high_school_statistics
data_files:
- split: train
path: mmlu_high_school_statistics/train-*
- config_name: mmlu_high_school_us_history
data_files:
- split: train
path: mmlu_high_school_us_history/train-*
- config_name: mmlu_high_school_world_history
data_files:
- split: train
path: mmlu_high_school_world_history/train-*
- config_name: mmlu_human_aging
data_files:
- split: train
path: mmlu_human_aging/train-*
- config_name: mmlu_human_sexuality
data_files:
- split: train
path: mmlu_human_sexuality/train-*
- config_name: mmlu_international_law
data_files:
- split: train
path: mmlu_international_law/train-*
- config_name: mmlu_jurisprudence
data_files:
- split: train
path: mmlu_jurisprudence/train-*
- config_name: mmlu_logical_fallacies
data_files:
- split: train
path: mmlu_logical_fallacies/train-*
- config_name: mmlu_machine_learning
data_files:
- split: train
path: mmlu_machine_learning/train-*
- config_name: mmlu_management
data_files:
- split: train
path: mmlu_management/train-*
- config_name: mmlu_marketing
data_files:
- split: train
path: mmlu_marketing/train-*
- config_name: mmlu_medical_genetics
data_files:
- split: train
path: mmlu_medical_genetics/train-*
- config_name: mmlu_miscellaneous
data_files:
- split: train
path: mmlu_miscellaneous/train-*
- config_name: mmlu_moral_disputes
data_files:
- split: train
path: mmlu_moral_disputes/train-*
- config_name: mmlu_moral_scenarios
data_files:
- split: train
path: mmlu_moral_scenarios/train-*
- config_name: mmlu_nutrition
data_files:
- split: train
path: mmlu_nutrition/train-*
- config_name: mmlu_philosophy
data_files:
- split: train
path: mmlu_philosophy/train-*
- config_name: mmlu_prehistory
data_files:
- split: train
path: mmlu_prehistory/train-*
- config_name: mmlu_professional_accounting
data_files:
- split: train
path: mmlu_professional_accounting/train-*
- config_name: mmlu_professional_law
data_files:
- split: train
path: mmlu_professional_law/train-*
- config_name: mmlu_professional_medicine
data_files:
- split: train
path: mmlu_professional_medicine/train-*
- config_name: mmlu_professional_psychology
data_files:
- split: train
path: mmlu_professional_psychology/train-*
- config_name: mmlu_public_relations
data_files:
- split: train
path: mmlu_public_relations/train-*
- config_name: mmlu_security_studies
data_files:
- split: train
path: mmlu_security_studies/train-*
- config_name: mmlu_sociology
data_files:
- split: train
path: mmlu_sociology/train-*
- config_name: mmlu_us_foreign_policy
data_files:
- split: train
path: mmlu_us_foreign_policy/train-*
- config_name: mmlu_virology
data_files:
- split: train
path: mmlu_virology/train-*
- config_name: mmlu_world_religions
data_files:
- split: train
path: mmlu_world_religions/train-*
- config_name: triviaqa
data_files:
- split: train
path: triviaqa/train-*
- config_name: winogrande
data_files:
- split: train
path: winogrande/train-*
---
提供机构:
unlearning-cleanslate
搜集汇总
数据集介绍

构建方式
该数据集源于对Nemotron-Nano-9B-v2模型进行SimNPO(一种温和的偏好对齐算法)微调后生成的推理轨迹,旨在系统性地评估与提升模型在多维度推理任务上的表现。数据集的构建过程严谨而细致:首先,针对ARC-Challenge、BBH(Big-Bench Hard)系列共27个子任务(涵盖常识推理、数学运算、逻辑演绎、自然语言理解等多个领域),精心设计了具有挑战性的输入提示(prompt)。随后,将微调后的模型应用于这些提示,通过配置特定的生成参数(如采样温度、最大生成令牌数、停止序列等)逐一产出一组初始响应。为剔除低质量或无效输出,数据集引入了精细的过滤机制,对生成内容进行筛选与规范化处理,最终保留了经过净化与整理的响应序列。每个数据样本均完整记录了原始问题文档、模型生成的原始与过滤后响应、目标答案以及模型生成的评分,形成了结构清晰、便于下游分析的闭环数据流。
特点
该数据集最引人瞩目的特质在于其针对单次推理能力评估与对齐的深度聚焦。所有数据样本均源自一个统一的、经过SimNPO算法温和调优的模型,确保了评估起点的一致性,为对比不同推理策略与偏好对齐效果提供了理想的控制变量。数据集内容横跨知识问答、符号推理、情境理解等广阔谱系,共计涵盖超过20个专业配置,总计数万个样本,体现出极佳的领域覆盖度与样本规模。数据结构经过精心设计,不仅保存了模型原始的多候选响应列表(resps)与过滤后的响应(filtered_resps),还附带了每个响应所对应的完整生成参数(arguments),使得对模型行为进行归因分析成为可能。此外,数据集通过doc_hash、prompt_hash、target_hash等校验字段确保了数据可追溯性,为实验的复现与验证提供了坚实保障。
使用方法
研究者可便捷地利用Hugging Face Datasets库加载该数据集,并通过指定config_name参数来索引特定任务子集,例如加载ARC-Challenge数据或BBH中的逻辑演绎任务。每个样本均提供了清晰的field定义,使用者可直接访问doc(原始问题与答案结构)、target(标准答案)、resps(模型生成的响应序列)等核心字段进行模型性能分析。特别地,score字段提供了对生成质量的量化评价,可用于直接评估模型在特定任务上的表现。数据集支持多种评估范式:既可利用filtered_resps部分进行模型生成精度的统计计算,亦可基于resps中的多候选响应开展多样性与稳定性研究。此外,结合arguments中的生成参数,研究者能够深入探究采样策略对推理结果的影响,为模型后训练与推理优化提供实证基础。数据集整体采用标准化格式,兼容主流NLP工具链,开箱即用,极大降低了使用门槛。
背景与挑战
背景概述
该数据集由NVIDIA研究团队于近期创建,旨在评估和提升Nemotron-Nano-9B-v2模型在复杂推理任务上的表现。数据集涵盖ARC-Challenge、BBH等多个高阶认知基准,包含丰富的生成响应、过滤结果及评分信息,为研究大语言模型的推理能力、偏好对齐与生成质量提供了精细化的评测框架。其发布对于推动轻量级模型在数学、逻辑与科学推理领域的性能评估具有重要意义。
当前挑战
数据集面临的核心挑战在于多维度推理能力的准确评估与噪声控制。领域问题层面,需应对从常识问答到形式谬误、多步算术等广泛推理任务的度量统一性难题;构建层面,则涉及生成参数设置的标准化、过滤策略的鲁棒性,以及如何确保评分机制能真实反映模型在复杂逻辑与长文本场景下的推理深度。
常用场景
经典使用场景
在自然语言处理与人工智能的交叉领域中,该数据集专为评估和优化大语言模型的推理能力而设计。其经典使用场景涵盖科学问答(如ARC-Challenge)与多领域逻辑推理任务(如BBH系列中的布尔表达式、因果判断、日期理解、消歧问答、狄克语言、形式谬误、几何形状、倒装句、逻辑演绎、电影推荐、多步算术、导航、物体计数、企鹅表格、彩色物体推理和名称破坏等)。研究者借助此数据集的多样化任务配置,系统性地检验模型在复杂推理链条中的表现,从而揭示其在常识理解、符号操作、空间推理及算术计算等方面的潜在局限与优势。
实际应用
在实际应用中,该数据集为开发更可靠的AI助手和智能决策系统提供了关键支撑。例如,在教育科技领域,可用于构建自动评估学生逻辑推理能力的工具;在企业知识管理场景中,帮助优化基于大模型的问答系统,使其在处理多跳查询时减少错误。此外,该数据集所覆盖的导航与计数任务可间接服务于机器人路径规划与库存管理,而电影推荐与日期理解则能增强推荐算法与日程管理应用的用户体验。其标准化生成格式也便于自动化测试流水线集成,实现模型版本迭代中的持续验证。
衍生相关工作
该数据集催生了一系列富有影响力的研究工作。一方面,基于其多任务生成结构,学者们开发了SimNPO等温和偏好优化算法,通过对比生成响应与过滤后的响应来改进模型的自我纠错能力。另一方面,该数据集被用于训练样本效率更高的推理蒸馏框架,例如Nemotron系列模型利用其生成数据进行二次精调,显著提升了在BBH基准上的零样本表现。此外,相关研究还探索了基于哈希去重的响应质量筛选策略,以及利用温度与最大生成长度等超参数调节引导模型生成更连贯推理链的方法,这些工作共同丰富了大语言模型的后训练与评测方法论。
以上内容由遇见数据集搜集并总结生成



