five

unlearning-cleanslate/generations-20-DEBUG-llama-3_1-8b-simnpo-gentle-igm-10b-target-100-localtrain-checkpoint-1

收藏
Hugging Face2026-05-01 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/unlearning-cleanslate/generations-20-DEBUG-llama-3_1-8b-simnpo-gentle-igm-10b-target-100-localtrain-checkpoint-1
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: arc_challenge features: - name: doc_id dtype: int64 - name: doc struct: - name: answerKey dtype: string - name: choices struct: - name: label list: string - name: text list: string - name: id dtype: string - name: question dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_4 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 1903382 num_examples: 1172 download_size: 1729139 dataset_size: 1903382 - config_name: bbh_cot_fewshot_boolean_expressions features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 687863 num_examples: 250 download_size: 677006 dataset_size: 687863 - config_name: bbh_cot_fewshot_causal_judgement features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1214117 num_examples: 187 download_size: 1206191 dataset_size: 1214117 - config_name: bbh_cot_fewshot_date_understanding features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 528855 num_examples: 250 download_size: 516735 dataset_size: 528855 - config_name: bbh_cot_fewshot_disambiguation_qa features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1353749 num_examples: 250 download_size: 1354953 dataset_size: 1353749 - config_name: bbh_cot_fewshot_dyck_languages features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 979758 num_examples: 250 download_size: 976894 dataset_size: 979758 - config_name: bbh_cot_fewshot_formal_fallacies features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1754762 num_examples: 250 download_size: 1741623 dataset_size: 1754762 - config_name: bbh_cot_fewshot_geometric_shapes features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1731692 num_examples: 250 download_size: 1718493 dataset_size: 1731692 - config_name: bbh_cot_fewshot_hyperbaton features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1181370 num_examples: 250 download_size: 1179056 dataset_size: 1181370 - config_name: bbh_cot_fewshot_logical_deduction_five_objects features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1175967 num_examples: 250 download_size: 1173049 dataset_size: 1175967 - config_name: bbh_cot_fewshot_logical_deduction_seven_objects features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1409171 num_examples: 250 download_size: 1408948 dataset_size: 1409171 - config_name: bbh_cot_fewshot_logical_deduction_three_objects features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1019190 num_examples: 250 download_size: 1015018 dataset_size: 1019190 - config_name: bbh_cot_fewshot_movie_recommendation features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 838363 num_examples: 250 download_size: 830282 dataset_size: 838363 - config_name: bbh_cot_fewshot_multistep_arithmetic_two features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 876598 num_examples: 250 download_size: 882159 dataset_size: 876598 - config_name: bbh_cot_fewshot_navigate features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 835404 num_examples: 250 download_size: 828193 dataset_size: 835404 - config_name: bbh_cot_fewshot_object_counting features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 565741 num_examples: 250 download_size: 554762 dataset_size: 565741 - config_name: bbh_cot_fewshot_penguins_in_a_table features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 588011 num_examples: 146 download_size: 595802 dataset_size: 588011 - config_name: bbh_cot_fewshot_reasoning_about_colored_objects features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 929409 num_examples: 250 download_size: 923867 dataset_size: 929409 - config_name: bbh_cot_fewshot_ruin_names features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1224832 num_examples: 250 download_size: 1223787 dataset_size: 1224832 - config_name: bbh_cot_fewshot_salient_translation_error_detection features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 2390620 num_examples: 250 download_size: 2378274 dataset_size: 2390620 - config_name: bbh_cot_fewshot_snarks features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 801753 num_examples: 178 download_size: 806767 dataset_size: 801753 - config_name: bbh_cot_fewshot_sports_understanding features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 364273 num_examples: 250 download_size: 349036 dataset_size: 364273 - config_name: bbh_cot_fewshot_temporal_sequences features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1204274 num_examples: 250 download_size: 1202429 dataset_size: 1204274 - config_name: bbh_cot_fewshot_tracking_shuffled_objects_five_objects features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1266671 num_examples: 250 download_size: 1265687 dataset_size: 1266671 - config_name: bbh_cot_fewshot_tracking_shuffled_objects_seven_objects features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1501895 num_examples: 250 download_size: 1504535 dataset_size: 1501895 - config_name: bbh_cot_fewshot_tracking_shuffled_objects_three_objects features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1085008 num_examples: 250 download_size: 1081435 dataset_size: 1085008 - config_name: bbh_cot_fewshot_web_of_lies features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1103014 num_examples: 250 download_size: 1100173 dataset_size: 1103014 - config_name: bbh_cot_fewshot_word_sorting features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1092826 num_examples: 250 download_size: 1100059 dataset_size: 1092826 - config_name: cleanslate_qa features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: string - name: content_id dtype: string - name: content_title dtype: string - name: question dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 9455350 num_examples: 12088 download_size: 8564663 dataset_size: 9455350 - config_name: coqa features: - name: doc_id dtype: int64 - name: doc struct: - name: additional_answers struct: - name: '0' struct: - name: input_text list: string - name: span_end list: int64 - name: span_start list: int64 - name: span_text list: string - name: turn_id list: int64 - name: '1' struct: - name: input_text list: string - name: span_end list: int64 - name: span_start list: int64 - name: span_text list: string - name: turn_id list: int64 - name: '2' struct: - name: input_text list: string - name: span_end list: int64 - name: span_start list: int64 - name: span_text list: string - name: turn_id list: int64 - name: answers struct: - name: input_text list: string - name: span_end list: int64 - name: span_start list: int64 - name: span_text list: string - name: turn_id list: int64 - name: id dtype: string - name: questions struct: - name: input_text list: string - name: turn_id list: int64 - name: source dtype: string - name: story dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: float64 - name: score dtype: float64 splits: - name: train num_bytes: 5763926 num_examples: 500 download_size: 5766913 dataset_size: 5763926 - config_name: drop features: - name: doc_id dtype: int64 - name: doc struct: - name: answer struct: - name: date struct: - name: day dtype: string - name: month dtype: string - name: year dtype: string - name: hit_id dtype: string - name: number dtype: string - name: spans list: string - name: worker_id dtype: string - name: answers list: list: string - name: id dtype: string - name: passage dtype: string - name: query_id dtype: string - name: question dtype: string - name: section_id dtype: string - name: validated_answers struct: - name: date list: - name: day dtype: string - name: month dtype: string - name: year dtype: string - name: hit_id list: string - name: number list: string - name: spans list: list: string - name: worker_id list: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 29516670 num_examples: 9536 download_size: 27772431 dataset_size: 29516670 - config_name: gsm8k features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: string - name: question dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 11667038 num_examples: 2638 download_size: 10732744 dataset_size: 11667038 - config_name: hellaswag features: - name: doc_id dtype: int64 - name: doc struct: - name: activity_label dtype: string - name: choices list: string - name: ctx dtype: string - name: ctx_a dtype: string - name: ctx_b dtype: string - name: endings list: string - name: gold dtype: int64 - name: ind dtype: int64 - name: label dtype: string - name: query dtype: string - name: source_id dtype: string - name: split dtype: string - name: split_type dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 39604581 num_examples: 10042 download_size: 38115288 dataset_size: 39604581 - config_name: humaneval_plus features: - name: doc_id dtype: int64 - name: doc struct: - name: canonical_solution dtype: string - name: entry_point dtype: string - name: prompt dtype: string - name: task_id dtype: string - name: test dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 22083812 num_examples: 164 download_size: 14080468 dataset_size: 22083812 - config_name: lambada_openai features: - name: doc_id dtype: int64 - name: doc struct: - name: text dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 5116209 num_examples: 5153 download_size: 4754256 dataset_size: 5116209 - config_name: mmlu_abstract_algebra features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 187382 num_examples: 100 download_size: 189098 dataset_size: 187382 - config_name: mmlu_anatomy features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 282091 num_examples: 135 download_size: 280253 dataset_size: 282091 - config_name: mmlu_astronomy features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 366465 num_examples: 152 download_size: 363588 dataset_size: 366465 - config_name: mmlu_business_ethics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 256478 num_examples: 100 download_size: 259846 dataset_size: 256478 - config_name: mmlu_clinical_knowledge features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 557057 num_examples: 265 download_size: 535320 dataset_size: 557057 - config_name: mmlu_college_biology features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 374067 num_examples: 144 download_size: 370959 dataset_size: 374067 - config_name: mmlu_college_chemistry features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 214804 num_examples: 100 download_size: 219127 dataset_size: 214804 - config_name: mmlu_college_computer_science features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 308051 num_examples: 100 download_size: 316883 dataset_size: 308051 - config_name: mmlu_college_mathematics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 215809 num_examples: 100 download_size: 217170 dataset_size: 215809 - config_name: mmlu_college_medicine features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 569112 num_examples: 173 download_size: 564574 dataset_size: 569112 - config_name: mmlu_college_physics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 242961 num_examples: 102 download_size: 245987 dataset_size: 242961 - config_name: mmlu_computer_security features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 226902 num_examples: 100 download_size: 228456 dataset_size: 226902 - config_name: mmlu_conceptual_physics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 419282 num_examples: 235 download_size: 400286 dataset_size: 419282 - config_name: mmlu_econometrics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 333979 num_examples: 114 download_size: 335287 dataset_size: 333979 - config_name: mmlu_electrical_engineering features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 261691 num_examples: 145 download_size: 255745 dataset_size: 261691 - config_name: mmlu_elementary_mathematics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 705182 num_examples: 378 download_size: 661835 dataset_size: 705182 - config_name: mmlu_formal_logic features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 360503 num_examples: 126 download_size: 361652 dataset_size: 360503 - config_name: mmlu_global_facts features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 180787 num_examples: 100 download_size: 182002 dataset_size: 180787 - config_name: mmlu_high_school_biology features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 834836 num_examples: 310 download_size: 806584 dataset_size: 834836 - config_name: mmlu_high_school_chemistry features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 481688 num_examples: 203 download_size: 467968 dataset_size: 481688 - config_name: mmlu_high_school_computer_science features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 319168 num_examples: 100 download_size: 325504 dataset_size: 319168 - config_name: mmlu_high_school_european_history features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 1511187 num_examples: 165 download_size: 1524827 dataset_size: 1511187 - config_name: mmlu_high_school_geography features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 394832 num_examples: 198 download_size: 381442 dataset_size: 394832 - config_name: mmlu_high_school_government_and_politics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 523938 num_examples: 193 download_size: 513507 dataset_size: 523938 - config_name: mmlu_high_school_macroeconomics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 962113 num_examples: 390 download_size: 921916 dataset_size: 962113 - config_name: mmlu_high_school_mathematics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 528966 num_examples: 270 download_size: 507464 dataset_size: 528966 - config_name: mmlu_high_school_microeconomics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 606573 num_examples: 238 download_size: 589013 dataset_size: 606573 - config_name: mmlu_high_school_physics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 437068 num_examples: 151 download_size: 439677 dataset_size: 437068 - config_name: mmlu_high_school_psychology features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 1308314 num_examples: 545 download_size: 1241169 dataset_size: 1308314 - config_name: mmlu_high_school_statistics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 756162 num_examples: 216 download_size: 743495 dataset_size: 756162 - config_name: mmlu_high_school_us_history features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 1674968 num_examples: 204 download_size: 1686115 dataset_size: 1674968 - config_name: mmlu_high_school_world_history features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 2118925 num_examples: 237 download_size: 2123828 dataset_size: 2118925 - config_name: mmlu_human_aging features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 427406 num_examples: 223 download_size: 410301 dataset_size: 427406 - config_name: mmlu_human_sexuality features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 278870 num_examples: 131 download_size: 276028 dataset_size: 278870 - config_name: mmlu_international_law features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 378047 num_examples: 121 download_size: 383687 dataset_size: 378047 - config_name: mmlu_jurisprudence features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 266324 num_examples: 108 download_size: 266201 dataset_size: 266324 - config_name: mmlu_logical_fallacies features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 399445 num_examples: 163 download_size: 394165 dataset_size: 399445 - config_name: mmlu_machine_learning features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 271110 num_examples: 112 download_size: 270956 dataset_size: 271110 - config_name: mmlu_management features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 190461 num_examples: 103 download_size: 191210 dataset_size: 190461 - config_name: mmlu_marketing features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 518931 num_examples: 234 download_size: 502282 dataset_size: 518931 - config_name: mmlu_medical_genetics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 195094 num_examples: 100 download_size: 198622 dataset_size: 195094 - config_name: mmlu_miscellaneous features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 1437994 num_examples: 783 download_size: 1340330 dataset_size: 1437994 - config_name: mmlu_moral_disputes features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 848448 num_examples: 346 download_size: 813554 dataset_size: 848448 - config_name: mmlu_moral_scenarios features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 2678848 num_examples: 895 download_size: 2563085 dataset_size: 2678848 - config_name: mmlu_nutrition features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 727906 num_examples: 306 download_size: 702661 dataset_size: 727906 - config_name: mmlu_philosophy features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 673358 num_examples: 311 download_size: 645970 dataset_size: 673358 - config_name: mmlu_prehistory features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 732484 num_examples: 324 download_size: 702539 dataset_size: 732484 - config_name: mmlu_professional_accounting features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 888728 num_examples: 282 download_size: 865365 dataset_size: 888728 - config_name: mmlu_professional_law features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 10852524 num_examples: 1534 download_size: 10727841 dataset_size: 10852524 - config_name: mmlu_professional_medicine features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 1341713 num_examples: 272 download_size: 1337132 dataset_size: 1341713 - config_name: mmlu_professional_psychology features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 1706647 num_examples: 612 download_size: 1633040 dataset_size: 1706647 - config_name: mmlu_public_relations features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 243620 num_examples: 110 download_size: 244627 dataset_size: 243620 - config_name: mmlu_security_studies features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 1246739 num_examples: 245 download_size: 1235481 dataset_size: 1246739 - config_name: mmlu_sociology features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 506705 num_examples: 201 download_size: 494037 dataset_size: 506705 - config_name: mmlu_us_foreign_policy features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 233403 num_examples: 100 download_size: 235016 dataset_size: 233403 - config_name: mmlu_virology features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 337855 num_examples: 166 download_size: 331648 dataset_size: 337855 - config_name: mmlu_world_religions features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 280496 num_examples: 171 download_size: 270727 dataset_size: 280496 - config_name: triviaqa features: - name: doc_id dtype: int64 - name: doc struct: - name: answer struct: - name: aliases list: string - name: matched_wiki_entity_name dtype: string - name: normalized_aliases list: string - name: normalized_matched_wiki_entity_name dtype: string - name: normalized_value dtype: string - name: type dtype: string - name: value dtype: string - name: entity_pages struct: - name: doc_source list: 'null' - name: filename list: 'null' - name: title list: 'null' - name: wiki_context list: 'null' - name: question dtype: string - name: question_id dtype: string - name: question_source dtype: string - name: search_results struct: - name: description list: 'null' - name: filename list: 'null' - name: rank list: 'null' - name: search_context list: 'null' - name: title list: 'null' - name: url list: 'null' - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: float64 - name: score dtype: float64 splits: - name: train num_bytes: 27697149 num_examples: 17944 download_size: 20974505 dataset_size: 27697149 - config_name: winogrande features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: string - name: option1 dtype: string - name: option2 dtype: string - name: sentence dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 981695 num_examples: 1267 download_size: 884607 dataset_size: 981695 configs: - config_name: arc_challenge data_files: - split: train path: arc_challenge/train-* - config_name: bbh_cot_fewshot_boolean_expressions data_files: - split: train path: bbh_cot_fewshot_boolean_expressions/train-* - config_name: bbh_cot_fewshot_causal_judgement data_files: - split: train path: bbh_cot_fewshot_causal_judgement/train-* - config_name: bbh_cot_fewshot_date_understanding data_files: - split: train path: bbh_cot_fewshot_date_understanding/train-* - config_name: bbh_cot_fewshot_disambiguation_qa data_files: - split: train path: bbh_cot_fewshot_disambiguation_qa/train-* - config_name: bbh_cot_fewshot_dyck_languages data_files: - split: train path: bbh_cot_fewshot_dyck_languages/train-* - config_name: bbh_cot_fewshot_formal_fallacies data_files: - split: train path: bbh_cot_fewshot_formal_fallacies/train-* - config_name: bbh_cot_fewshot_geometric_shapes data_files: - split: train path: bbh_cot_fewshot_geometric_shapes/train-* - config_name: bbh_cot_fewshot_hyperbaton data_files: - split: train path: bbh_cot_fewshot_hyperbaton/train-* - config_name: bbh_cot_fewshot_logical_deduction_five_objects data_files: - split: train path: bbh_cot_fewshot_logical_deduction_five_objects/train-* - config_name: bbh_cot_fewshot_logical_deduction_seven_objects data_files: - split: train path: bbh_cot_fewshot_logical_deduction_seven_objects/train-* - config_name: bbh_cot_fewshot_logical_deduction_three_objects data_files: - split: train path: bbh_cot_fewshot_logical_deduction_three_objects/train-* - config_name: bbh_cot_fewshot_movie_recommendation data_files: - split: train path: bbh_cot_fewshot_movie_recommendation/train-* - config_name: bbh_cot_fewshot_multistep_arithmetic_two data_files: - split: train path: bbh_cot_fewshot_multistep_arithmetic_two/train-* - config_name: bbh_cot_fewshot_navigate data_files: - split: train path: bbh_cot_fewshot_navigate/train-* - config_name: bbh_cot_fewshot_object_counting data_files: - split: train path: bbh_cot_fewshot_object_counting/train-* - config_name: bbh_cot_fewshot_penguins_in_a_table data_files: - split: train path: bbh_cot_fewshot_penguins_in_a_table/train-* - config_name: bbh_cot_fewshot_reasoning_about_colored_objects data_files: - split: train path: bbh_cot_fewshot_reasoning_about_colored_objects/train-* - config_name: bbh_cot_fewshot_ruin_names data_files: - split: train path: bbh_cot_fewshot_ruin_names/train-* - config_name: bbh_cot_fewshot_salient_translation_error_detection data_files: - split: train path: bbh_cot_fewshot_salient_translation_error_detection/train-* - config_name: bbh_cot_fewshot_snarks data_files: - split: train path: bbh_cot_fewshot_snarks/train-* - config_name: bbh_cot_fewshot_sports_understanding data_files: - split: train path: bbh_cot_fewshot_sports_understanding/train-* - config_name: bbh_cot_fewshot_temporal_sequences data_files: - split: train path: bbh_cot_fewshot_temporal_sequences/train-* - config_name: bbh_cot_fewshot_tracking_shuffled_objects_five_objects data_files: - split: train path: bbh_cot_fewshot_tracking_shuffled_objects_five_objects/train-* - config_name: bbh_cot_fewshot_tracking_shuffled_objects_seven_objects data_files: - split: train path: bbh_cot_fewshot_tracking_shuffled_objects_seven_objects/train-* - config_name: bbh_cot_fewshot_tracking_shuffled_objects_three_objects data_files: - split: train path: bbh_cot_fewshot_tracking_shuffled_objects_three_objects/train-* - config_name: bbh_cot_fewshot_web_of_lies data_files: - split: train path: bbh_cot_fewshot_web_of_lies/train-* - config_name: bbh_cot_fewshot_word_sorting data_files: - split: train path: bbh_cot_fewshot_word_sorting/train-* - config_name: cleanslate_qa data_files: - split: train path: cleanslate_qa/train-* - config_name: coqa data_files: - split: train path: coqa/train-* - config_name: drop data_files: - split: train path: drop/train-* - config_name: gsm8k data_files: - split: train path: gsm8k/train-* - config_name: hellaswag data_files: - split: train path: hellaswag/train-* - config_name: humaneval_plus data_files: - split: train path: humaneval_plus/train-* - config_name: lambada_openai data_files: - split: train path: lambada_openai/train-* - config_name: mmlu_abstract_algebra data_files: - split: train path: mmlu_abstract_algebra/train-* - config_name: mmlu_anatomy data_files: - split: train path: mmlu_anatomy/train-* - config_name: mmlu_astronomy data_files: - split: train path: mmlu_astronomy/train-* - config_name: mmlu_business_ethics data_files: - split: train path: mmlu_business_ethics/train-* - config_name: mmlu_clinical_knowledge data_files: - split: train path: mmlu_clinical_knowledge/train-* - config_name: mmlu_college_biology data_files: - split: train path: mmlu_college_biology/train-* - config_name: mmlu_college_chemistry data_files: - split: train path: mmlu_college_chemistry/train-* - config_name: mmlu_college_computer_science data_files: - split: train path: mmlu_college_computer_science/train-* - config_name: mmlu_college_mathematics data_files: - split: train path: mmlu_college_mathematics/train-* - config_name: mmlu_college_medicine data_files: - split: train path: mmlu_college_medicine/train-* - config_name: mmlu_college_physics data_files: - split: train path: mmlu_college_physics/train-* - config_name: mmlu_computer_security data_files: - split: train path: mmlu_computer_security/train-* - config_name: mmlu_conceptual_physics data_files: - split: train path: mmlu_conceptual_physics/train-* - config_name: mmlu_econometrics data_files: - split: train path: mmlu_econometrics/train-* - config_name: mmlu_electrical_engineering data_files: - split: train path: mmlu_electrical_engineering/train-* - config_name: mmlu_elementary_mathematics data_files: - split: train path: mmlu_elementary_mathematics/train-* - config_name: mmlu_formal_logic data_files: - split: train path: mmlu_formal_logic/train-* - config_name: mmlu_global_facts data_files: - split: train path: mmlu_global_facts/train-* - config_name: mmlu_high_school_biology data_files: - split: train path: mmlu_high_school_biology/train-* - config_name: mmlu_high_school_chemistry data_files: - split: train path: mmlu_high_school_chemistry/train-* - config_name: mmlu_high_school_computer_science data_files: - split: train path: mmlu_high_school_computer_science/train-* - config_name: mmlu_high_school_european_history data_files: - split: train path: mmlu_high_school_european_history/train-* - config_name: mmlu_high_school_geography data_files: - split: train path: mmlu_high_school_geography/train-* - config_name: mmlu_high_school_government_and_politics data_files: - split: train path: mmlu_high_school_government_and_politics/train-* - config_name: mmlu_high_school_macroeconomics data_files: - split: train path: mmlu_high_school_macroeconomics/train-* - config_name: mmlu_high_school_mathematics data_files: - split: train path: mmlu_high_school_mathematics/train-* - config_name: mmlu_high_school_microeconomics data_files: - split: train path: mmlu_high_school_microeconomics/train-* - config_name: mmlu_high_school_physics data_files: - split: train path: mmlu_high_school_physics/train-* - config_name: mmlu_high_school_psychology data_files: - split: train path: mmlu_high_school_psychology/train-* - config_name: mmlu_high_school_statistics data_files: - split: train path: mmlu_high_school_statistics/train-* - config_name: mmlu_high_school_us_history data_files: - split: train path: mmlu_high_school_us_history/train-* - config_name: mmlu_high_school_world_history data_files: - split: train path: mmlu_high_school_world_history/train-* - config_name: mmlu_human_aging data_files: - split: train path: mmlu_human_aging/train-* - config_name: mmlu_human_sexuality data_files: - split: train path: mmlu_human_sexuality/train-* - config_name: mmlu_international_law data_files: - split: train path: mmlu_international_law/train-* - config_name: mmlu_jurisprudence data_files: - split: train path: mmlu_jurisprudence/train-* - config_name: mmlu_logical_fallacies data_files: - split: train path: mmlu_logical_fallacies/train-* - config_name: mmlu_machine_learning data_files: - split: train path: mmlu_machine_learning/train-* - config_name: mmlu_management data_files: - split: train path: mmlu_management/train-* - config_name: mmlu_marketing data_files: - split: train path: mmlu_marketing/train-* - config_name: mmlu_medical_genetics data_files: - split: train path: mmlu_medical_genetics/train-* - config_name: mmlu_miscellaneous data_files: - split: train path: mmlu_miscellaneous/train-* - config_name: mmlu_moral_disputes data_files: - split: train path: mmlu_moral_disputes/train-* - config_name: mmlu_moral_scenarios data_files: - split: train path: mmlu_moral_scenarios/train-* - config_name: mmlu_nutrition data_files: - split: train path: mmlu_nutrition/train-* - config_name: mmlu_philosophy data_files: - split: train path: mmlu_philosophy/train-* - config_name: mmlu_prehistory data_files: - split: train path: mmlu_prehistory/train-* - config_name: mmlu_professional_accounting data_files: - split: train path: mmlu_professional_accounting/train-* - config_name: mmlu_professional_law data_files: - split: train path: mmlu_professional_law/train-* - config_name: mmlu_professional_medicine data_files: - split: train path: mmlu_professional_medicine/train-* - config_name: mmlu_professional_psychology data_files: - split: train path: mmlu_professional_psychology/train-* - config_name: mmlu_public_relations data_files: - split: train path: mmlu_public_relations/train-* - config_name: mmlu_security_studies data_files: - split: train path: mmlu_security_studies/train-* - config_name: mmlu_sociology data_files: - split: train path: mmlu_sociology/train-* - config_name: mmlu_us_foreign_policy data_files: - split: train path: mmlu_us_foreign_policy/train-* - config_name: mmlu_virology data_files: - split: train path: mmlu_virology/train-* - config_name: mmlu_world_religions data_files: - split: train path: mmlu_world_religions/train-* - config_name: triviaqa data_files: - split: train path: triviaqa/train-* - config_name: winogrande data_files: - split: train path: winogrande/train-* ---
提供机构:
unlearning-cleanslate
搜集汇总
数据集介绍
main_image_url
构建方式
该数据集基于LLaMA-3.1-8B模型在SimNPO算法框架下进行推理生成,采用gentle-igm-10b策略并设定目标为100,在本地训练过程中从checkpoint-1处采集生成结果。数据构建融合了多个经典推理基准,涵盖ARC-Challenge、BBH系列(如boolean_expressions、causal_judgement、date_understanding等)共20余个子任务,每个子任务独立配置字段结构,包含原始文档、目标答案、生成参数、模型响应、过滤后的响应及评分等关键信息。
特点
该数据集的显著特征在于其多维度、多任务的结构化设计。每个样本均记录doc_id以保证追踪溯源,同时保存了gen_args系列参数(如do_sample、temperature、max_gen_toks等)用于复现生成过程。响应字段包含原始输出与过滤后结果,并附有filter标识与score评分,便于后续分析模型表现。特别地,arc_challenge配置中含有choices字段结构,适用于选择题型;而BBH系列则统一使用input-target格式,便于链式思维推理任务的评估。
使用方法
研究者可通过HuggingFace Datasets库加载该数据集,依据config_name参数选择特定子任务(如'arc_challenge'或'bbh_cot_fewshot_boolean_expressions')进行使用。数据以训练集形式提供,可直接用于评估模型在多项推理任务上的生成质量。结合doc、target与resps字段,可计算模型响应与标准答案的一致性;借助filtered_resps与score,可分析过滤策略的效果。生成参数字段为复现和调试推理过程提供了完整参考。
背景与挑战
背景概述
该数据集诞生于大语言模型对齐与推理能力优化的前沿探索时期,由研究团队基于 Llama 3.1 8B 模型,采用 SimNPO(一种偏好优化方法)结合 gentle-igm 策略生成,旨在解决模型在复杂推理任务中的对齐问题。核心研究问题聚焦于如何通过生成多样化推理路径来提升模型在 ARC-Challenge、BBH 等基准上的表现。数据集覆盖了从常识推理到数学、逻辑、符号操作等广泛的高阶认知任务,为研究模型内化推理偏好、减少错误生成提供了重要资源。其影响力体现在为偏好优化研究开辟了新范式,即通过显式记录模型生成的中间推理步骤及其过滤、评分过程,使研究者得以细致分析模型决策边界与对齐效果。
当前挑战
该数据集致力于攻克的核心领域挑战在于:大语言模型在面对需要多步逻辑推导、符号操作及常识整合的任务时,普遍存在推理不稳定、偏好对齐困难的问题,尤其在 ARC-Challenge 和 BBH 等要求严苛的基准上,模型常生成似是而非的推理链。构建过程中面临的挑战包括:如何设计高效的生成参数(如温度、生成长度)以平衡推理多样性与连贯性;如何制定合理的过滤与评分机制,从海量生成结果中甄别出正确且信息量丰富的推理路径;以及如何确保跨任务(如几何推理、日期理解)的数据格式归一化与哈希一致性,以支持后续的分析与复用。
常用场景
经典使用场景
该数据集作为大语言模型对齐优化过程中的中间产物,最经典的使用场景是用于SimNPO(SimPO变体)等偏好优化算法的训练与验证。它存储了在特定生成参数配置下,模型对多元推理任务(如ARC挑战、BBH因果判断、日期理解等)产生的原始响应及过滤后的结果,可精准评估不同对齐策略对模型推理能力的影响,是偏好学习、强化学习从人类反馈(RLHF)等方向研究的核心训练材料。
解决学术问题
该数据集解决了大语言模型偏好对齐研究中数据稀缺与泛化性评估困难的问题。其涵盖的多领域推理任务(包括逻辑演绎、几何推理、多步算术等)为衡量模型在复杂认知场景下的偏好一致性提供了标准化基准。通过对比原始响应与过滤后响应,学者可系统分析对齐算法对模型过度优化或覆盖不足的缺陷,对理解免奖励模型离线优化策略的边界效应具有重要的方法论意义。
衍生相关工作
该数据集衍生了系列关于免奖励偏好优化的经典工作,特别是在SimNPO与DPO(直接偏好优化)的对比研究中,其作为中间监督信号被广泛引用。相关后续工作进一步探索了基于该数据的响应筛选策略,如采用自洽性过滤与拒绝采样机制,推动了Evolve-Align等迭代式对齐框架的发展。此外,其在BBH任务上的结构化响应记录为推理链长度与模型性能的耦合分析提供了实证基础。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务