five

unlearning-cleanslate/generations-olmo-3-7b-rmu-baseline

收藏
Hugging Face2026-05-02 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/unlearning-cleanslate/generations-olmo-3-7b-rmu-baseline
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: arc_challenge features: - name: doc_id dtype: int64 - name: doc struct: - name: answerKey dtype: string - name: choices struct: - name: label list: string - name: text list: string - name: id dtype: string - name: question dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_4 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 1903220 num_examples: 1172 download_size: 1728974 dataset_size: 1903220 - config_name: bbh_cot_fewshot_boolean_expressions features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 701918 num_examples: 250 download_size: 691087 dataset_size: 701918 - config_name: bbh_cot_fewshot_causal_judgement features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1248417 num_examples: 187 download_size: 1240419 dataset_size: 1248417 - config_name: bbh_cot_fewshot_date_understanding features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 531805 num_examples: 250 download_size: 519441 dataset_size: 531805 - config_name: bbh_cot_fewshot_disambiguation_qa features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1348973 num_examples: 250 download_size: 1350163 dataset_size: 1348973 - config_name: bbh_cot_fewshot_dyck_languages features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1283235 num_examples: 250 download_size: 1295536 dataset_size: 1283235 - config_name: bbh_cot_fewshot_formal_fallacies features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1810734 num_examples: 250 download_size: 1796172 dataset_size: 1810734 - config_name: bbh_cot_fewshot_geometric_shapes features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1822865 num_examples: 250 download_size: 1812517 dataset_size: 1822865 - config_name: bbh_cot_fewshot_hyperbaton features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1183877 num_examples: 250 download_size: 1181708 dataset_size: 1183877 - config_name: bbh_cot_fewshot_logical_deduction_five_objects features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1282880 num_examples: 250 download_size: 1284434 dataset_size: 1282880 - config_name: bbh_cot_fewshot_logical_deduction_seven_objects features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1594150 num_examples: 250 download_size: 1605664 dataset_size: 1594150 - config_name: bbh_cot_fewshot_logical_deduction_three_objects features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1063791 num_examples: 250 download_size: 1059695 dataset_size: 1063791 - config_name: bbh_cot_fewshot_movie_recommendation features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 830946 num_examples: 250 download_size: 822761 dataset_size: 830946 - config_name: bbh_cot_fewshot_multistep_arithmetic_two features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 916086 num_examples: 250 download_size: 921656 dataset_size: 916086 - config_name: bbh_cot_fewshot_navigate features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 837184 num_examples: 250 download_size: 829994 dataset_size: 837184 - config_name: bbh_cot_fewshot_object_counting features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 568292 num_examples: 250 download_size: 557340 dataset_size: 568292 - config_name: bbh_cot_fewshot_penguins_in_a_table features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 582209 num_examples: 146 download_size: 589810 dataset_size: 582209 - config_name: bbh_cot_fewshot_reasoning_about_colored_objects features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 946354 num_examples: 250 download_size: 940618 dataset_size: 946354 - config_name: bbh_cot_fewshot_ruin_names features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1266193 num_examples: 250 download_size: 1264885 dataset_size: 1266193 - config_name: bbh_cot_fewshot_salient_translation_error_detection features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 2436403 num_examples: 250 download_size: 2423850 dataset_size: 2436403 - config_name: bbh_cot_fewshot_snarks features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 852882 num_examples: 178 download_size: 858142 dataset_size: 852882 - config_name: bbh_cot_fewshot_sports_understanding features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 363231 num_examples: 250 download_size: 347949 dataset_size: 363231 - config_name: bbh_cot_fewshot_temporal_sequences features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1201262 num_examples: 250 download_size: 1199449 dataset_size: 1201262 - config_name: bbh_cot_fewshot_tracking_shuffled_objects_five_objects features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1249718 num_examples: 250 download_size: 1248493 dataset_size: 1249718 - config_name: bbh_cot_fewshot_tracking_shuffled_objects_seven_objects features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1466909 num_examples: 250 download_size: 1469207 dataset_size: 1466909 - config_name: bbh_cot_fewshot_tracking_shuffled_objects_three_objects features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1082391 num_examples: 250 download_size: 1078794 dataset_size: 1082391 - config_name: bbh_cot_fewshot_web_of_lies features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1098941 num_examples: 250 download_size: 1096019 dataset_size: 1098941 - config_name: bbh_cot_fewshot_word_sorting features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1047911 num_examples: 250 download_size: 1055254 dataset_size: 1047911 - config_name: cleanslate_qa features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: string - name: content_id dtype: string - name: content_title dtype: string - name: question dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 7866662 num_examples: 12088 download_size: 6975697 dataset_size: 7866662 - config_name: coqa features: - name: doc_id dtype: int64 - name: doc struct: - name: additional_answers struct: - name: '0' struct: - name: input_text list: string - name: span_end list: int64 - name: span_start list: int64 - name: span_text list: string - name: turn_id list: int64 - name: '1' struct: - name: input_text list: string - name: span_end list: int64 - name: span_start list: int64 - name: span_text list: string - name: turn_id list: int64 - name: '2' struct: - name: input_text list: string - name: span_end list: int64 - name: span_start list: int64 - name: span_text list: string - name: turn_id list: int64 - name: answers struct: - name: input_text list: string - name: span_end list: int64 - name: span_start list: int64 - name: span_text list: string - name: turn_id list: int64 - name: id dtype: string - name: questions struct: - name: input_text list: string - name: turn_id list: int64 - name: source dtype: string - name: story dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: float64 - name: score dtype: float64 splits: - name: train num_bytes: 5504410 num_examples: 500 download_size: 5507551 dataset_size: 5504410 - config_name: drop features: - name: doc_id dtype: int64 - name: doc struct: - name: answer struct: - name: date struct: - name: day dtype: string - name: month dtype: string - name: year dtype: string - name: hit_id dtype: string - name: number dtype: string - name: spans list: string - name: worker_id dtype: string - name: answers list: list: string - name: id dtype: string - name: passage dtype: string - name: query_id dtype: string - name: question dtype: string - name: section_id dtype: string - name: validated_answers struct: - name: date list: - name: day dtype: string - name: month dtype: string - name: year dtype: string - name: hit_id list: string - name: number list: string - name: spans list: list: string - name: worker_id list: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 27216496 num_examples: 9536 download_size: 25472949 dataset_size: 27216496 - config_name: gsm8k features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: string - name: question dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 11660865 num_examples: 2638 download_size: 10725598 dataset_size: 11660865 - config_name: hellaswag features: - name: doc_id dtype: int64 - name: doc struct: - name: activity_label dtype: string - name: choices list: string - name: ctx dtype: string - name: ctx_a dtype: string - name: ctx_b dtype: string - name: endings list: string - name: gold dtype: int64 - name: ind dtype: int64 - name: label dtype: string - name: query dtype: string - name: source_id dtype: string - name: split dtype: string - name: split_type dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 39600177 num_examples: 10042 download_size: 38110970 dataset_size: 39600177 - config_name: humaneval_plus features: - name: doc_id dtype: int64 - name: doc struct: - name: canonical_solution dtype: string - name: entry_point dtype: string - name: prompt dtype: string - name: task_id dtype: string - name: test dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 22127542 num_examples: 164 download_size: 14118418 dataset_size: 22127542 - config_name: lambada_openai features: - name: doc_id dtype: int64 - name: doc struct: - name: text dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 5114399 num_examples: 5153 download_size: 4752421 dataset_size: 5114399 - config_name: mmlu_abstract_algebra features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 187414 num_examples: 100 download_size: 189131 dataset_size: 187414 - config_name: mmlu_anatomy features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 282163 num_examples: 135 download_size: 280325 dataset_size: 282163 - config_name: mmlu_astronomy features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 366557 num_examples: 152 download_size: 363681 dataset_size: 366557 - config_name: mmlu_business_ethics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 256548 num_examples: 100 download_size: 259912 dataset_size: 256548 - config_name: mmlu_clinical_knowledge features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 557013 num_examples: 265 download_size: 535268 dataset_size: 557013 - config_name: mmlu_college_biology features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 373965 num_examples: 144 download_size: 370853 dataset_size: 373965 - config_name: mmlu_college_chemistry features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 214810 num_examples: 100 download_size: 219129 dataset_size: 214810 - config_name: mmlu_college_computer_science features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 307947 num_examples: 100 download_size: 316781 dataset_size: 307947 - config_name: mmlu_college_mathematics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 215865 num_examples: 100 download_size: 217226 dataset_size: 215865 - config_name: mmlu_college_medicine features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 569064 num_examples: 173 download_size: 564524 dataset_size: 569064 - config_name: mmlu_college_physics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 242973 num_examples: 102 download_size: 245991 dataset_size: 242973 - config_name: mmlu_computer_security features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 226932 num_examples: 100 download_size: 228479 dataset_size: 226932 - config_name: mmlu_conceptual_physics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 419312 num_examples: 235 download_size: 400312 dataset_size: 419312 - config_name: mmlu_econometrics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 333991 num_examples: 114 download_size: 335299 dataset_size: 333991 - config_name: mmlu_electrical_engineering features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 261615 num_examples: 145 download_size: 255661 dataset_size: 261615 - config_name: mmlu_elementary_mathematics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 705154 num_examples: 378 download_size: 661807 dataset_size: 705154 - config_name: mmlu_formal_logic features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 360573 num_examples: 126 download_size: 361722 dataset_size: 360573 - config_name: mmlu_global_facts features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 180829 num_examples: 100 download_size: 182037 dataset_size: 180829 - config_name: mmlu_high_school_biology features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 834778 num_examples: 310 download_size: 806521 dataset_size: 834778 - config_name: mmlu_high_school_chemistry features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 481682 num_examples: 203 download_size: 467958 dataset_size: 481682 - config_name: mmlu_high_school_computer_science features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 319126 num_examples: 100 download_size: 325458 dataset_size: 319126 - config_name: mmlu_high_school_european_history features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 1511259 num_examples: 165 download_size: 1524891 dataset_size: 1511259 - config_name: mmlu_high_school_geography features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 394874 num_examples: 198 download_size: 381479 dataset_size: 394874 - config_name: mmlu_high_school_government_and_politics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 523980 num_examples: 193 download_size: 513552 dataset_size: 523980 - config_name: mmlu_high_school_macroeconomics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 962131 num_examples: 390 download_size: 921928 dataset_size: 962131 - config_name: mmlu_high_school_mathematics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 528954 num_examples: 270 download_size: 507448 dataset_size: 528954 - config_name: mmlu_high_school_microeconomics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 606471 num_examples: 238 download_size: 588913 dataset_size: 606471 - config_name: mmlu_high_school_physics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 437076 num_examples: 151 download_size: 439683 dataset_size: 437076 - config_name: mmlu_high_school_psychology features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 1308312 num_examples: 545 download_size: 1241152 dataset_size: 1308312 - config_name: mmlu_high_school_statistics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 756222 num_examples: 216 download_size: 743557 dataset_size: 756222 - config_name: mmlu_high_school_us_history features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 1675120 num_examples: 204 download_size: 1686265 dataset_size: 1675120 - config_name: mmlu_high_school_world_history features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 2119005 num_examples: 237 download_size: 2123906 dataset_size: 2119005 - config_name: mmlu_human_aging features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 427482 num_examples: 223 download_size: 410369 dataset_size: 427482 - config_name: mmlu_human_sexuality features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 278846 num_examples: 131 download_size: 276002 dataset_size: 278846 - config_name: mmlu_international_law features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 378109 num_examples: 121 download_size: 383753 dataset_size: 378109 - config_name: mmlu_jurisprudence features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 266378 num_examples: 108 download_size: 266255 dataset_size: 266378 - config_name: mmlu_logical_fallacies features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 399353 num_examples: 163 download_size: 394067 dataset_size: 399353 - config_name: mmlu_machine_learning features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 271090 num_examples: 112 download_size: 270936 dataset_size: 271090 - config_name: mmlu_management features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 190413 num_examples: 103 download_size: 191156 dataset_size: 190413 - config_name: mmlu_marketing features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 518805 num_examples: 234 download_size: 502153 dataset_size: 518805 - config_name: mmlu_medical_genetics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 195122 num_examples: 100 download_size: 198643 dataset_size: 195122 - config_name: mmlu_miscellaneous features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 1437816 num_examples: 783 download_size: 1340149 dataset_size: 1437816 - config_name: mmlu_moral_disputes features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 848582 num_examples: 346 download_size: 813679 dataset_size: 848582 - config_name: mmlu_moral_scenarios features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 2678666 num_examples: 895 download_size: 2562917 dataset_size: 2678666 - config_name: mmlu_nutrition features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 727990 num_examples: 306 download_size: 702743 dataset_size: 727990 - config_name: mmlu_philosophy features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 673540 num_examples: 311 download_size: 646143 dataset_size: 673540 - config_name: mmlu_prehistory features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 732576 num_examples: 324 download_size: 702629 dataset_size: 732576 - config_name: mmlu_professional_accounting features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 888846 num_examples: 282 download_size: 865475 dataset_size: 888846 - config_name: mmlu_professional_law features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 10852808 num_examples: 1534 download_size: 10728119 dataset_size: 10852808 - config_name: mmlu_professional_medicine features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 1341757 num_examples: 272 download_size: 1337159 dataset_size: 1341757 - config_name: mmlu_professional_psychology features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 1706765 num_examples: 612 download_size: 1633154 dataset_size: 1706765 - config_name: mmlu_public_relations features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 243682 num_examples: 110 download_size: 244685 dataset_size: 243682 - config_name: mmlu_security_studies features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 1246799 num_examples: 245 download_size: 1235540 dataset_size: 1246799 - config_name: mmlu_sociology features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 506703 num_examples: 201 download_size: 494026 dataset_size: 506703 - config_name: mmlu_us_foreign_policy features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 233505 num_examples: 100 download_size: 235107 dataset_size: 233505 - config_name: mmlu_virology features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 337895 num_examples: 166 download_size: 331688 dataset_size: 337895 - config_name: mmlu_world_religions features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 280350 num_examples: 171 download_size: 270571 dataset_size: 280350 - config_name: triviaqa features: - name: doc_id dtype: int64 - name: doc struct: - name: answer struct: - name: aliases list: string - name: matched_wiki_entity_name dtype: string - name: normalized_aliases list: string - name: normalized_matched_wiki_entity_name dtype: string - name: normalized_value dtype: string - name: type dtype: string - name: value dtype: string - name: entity_pages struct: - name: doc_source list: 'null' - name: filename list: 'null' - name: title list: 'null' - name: wiki_context list: 'null' - name: question dtype: string - name: question_id dtype: string - name: question_source dtype: string - name: search_results struct: - name: description list: 'null' - name: filename list: 'null' - name: rank list: 'null' - name: search_context list: 'null' - name: title list: 'null' - name: url list: 'null' - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: float64 - name: score dtype: float64 splits: - name: train num_bytes: 27488078 num_examples: 17944 download_size: 20765401 dataset_size: 27488078 - config_name: winogrande features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: string - name: option1 dtype: string - name: option2 dtype: string - name: sentence dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 981627 num_examples: 1267 download_size: 884524 dataset_size: 981627 configs: - config_name: arc_challenge data_files: - split: train path: arc_challenge/train-* - config_name: bbh_cot_fewshot_boolean_expressions data_files: - split: train path: bbh_cot_fewshot_boolean_expressions/train-* - config_name: bbh_cot_fewshot_causal_judgement data_files: - split: train path: bbh_cot_fewshot_causal_judgement/train-* - config_name: bbh_cot_fewshot_date_understanding data_files: - split: train path: bbh_cot_fewshot_date_understanding/train-* - config_name: bbh_cot_fewshot_disambiguation_qa data_files: - split: train path: bbh_cot_fewshot_disambiguation_qa/train-* - config_name: bbh_cot_fewshot_dyck_languages data_files: - split: train path: bbh_cot_fewshot_dyck_languages/train-* - config_name: bbh_cot_fewshot_formal_fallacies data_files: - split: train path: bbh_cot_fewshot_formal_fallacies/train-* - config_name: bbh_cot_fewshot_geometric_shapes data_files: - split: train path: bbh_cot_fewshot_geometric_shapes/train-* - config_name: bbh_cot_fewshot_hyperbaton data_files: - split: train path: bbh_cot_fewshot_hyperbaton/train-* - config_name: bbh_cot_fewshot_logical_deduction_five_objects data_files: - split: train path: bbh_cot_fewshot_logical_deduction_five_objects/train-* - config_name: bbh_cot_fewshot_logical_deduction_seven_objects data_files: - split: train path: bbh_cot_fewshot_logical_deduction_seven_objects/train-* - config_name: bbh_cot_fewshot_logical_deduction_three_objects data_files: - split: train path: bbh_cot_fewshot_logical_deduction_three_objects/train-* - config_name: bbh_cot_fewshot_movie_recommendation data_files: - split: train path: bbh_cot_fewshot_movie_recommendation/train-* - config_name: bbh_cot_fewshot_multistep_arithmetic_two data_files: - split: train path: bbh_cot_fewshot_multistep_arithmetic_two/train-* - config_name: bbh_cot_fewshot_navigate data_files: - split: train path: bbh_cot_fewshot_navigate/train-* - config_name: bbh_cot_fewshot_object_counting data_files: - split: train path: bbh_cot_fewshot_object_counting/train-* - config_name: bbh_cot_fewshot_penguins_in_a_table data_files: - split: train path: bbh_cot_fewshot_penguins_in_a_table/train-* - config_name: bbh_cot_fewshot_reasoning_about_colored_objects data_files: - split: train path: bbh_cot_fewshot_reasoning_about_colored_objects/train-* - config_name: bbh_cot_fewshot_ruin_names data_files: - split: train path: bbh_cot_fewshot_ruin_names/train-* - config_name: bbh_cot_fewshot_salient_translation_error_detection data_files: - split: train path: bbh_cot_fewshot_salient_translation_error_detection/train-* - config_name: bbh_cot_fewshot_snarks data_files: - split: train path: bbh_cot_fewshot_snarks/train-* - config_name: bbh_cot_fewshot_sports_understanding data_files: - split: train path: bbh_cot_fewshot_sports_understanding/train-* - config_name: bbh_cot_fewshot_temporal_sequences data_files: - split: train path: bbh_cot_fewshot_temporal_sequences/train-* - config_name: bbh_cot_fewshot_tracking_shuffled_objects_five_objects data_files: - split: train path: bbh_cot_fewshot_tracking_shuffled_objects_five_objects/train-* - config_name: bbh_cot_fewshot_tracking_shuffled_objects_seven_objects data_files: - split: train path: bbh_cot_fewshot_tracking_shuffled_objects_seven_objects/train-* - config_name: bbh_cot_fewshot_tracking_shuffled_objects_three_objects data_files: - split: train path: bbh_cot_fewshot_tracking_shuffled_objects_three_objects/train-* - config_name: bbh_cot_fewshot_web_of_lies data_files: - split: train path: bbh_cot_fewshot_web_of_lies/train-* - config_name: bbh_cot_fewshot_word_sorting data_files: - split: train path: bbh_cot_fewshot_word_sorting/train-* - config_name: cleanslate_qa data_files: - split: train path: cleanslate_qa/train-* - config_name: coqa data_files: - split: train path: coqa/train-* - config_name: drop data_files: - split: train path: drop/train-* - config_name: gsm8k data_files: - split: train path: gsm8k/train-* - config_name: hellaswag data_files: - split: train path: hellaswag/train-* - config_name: humaneval_plus data_files: - split: train path: humaneval_plus/train-* - config_name: lambada_openai data_files: - split: train path: lambada_openai/train-* - config_name: mmlu_abstract_algebra data_files: - split: train path: mmlu_abstract_algebra/train-* - config_name: mmlu_anatomy data_files: - split: train path: mmlu_anatomy/train-* - config_name: mmlu_astronomy data_files: - split: train path: mmlu_astronomy/train-* - config_name: mmlu_business_ethics data_files: - split: train path: mmlu_business_ethics/train-* - config_name: mmlu_clinical_knowledge data_files: - split: train path: mmlu_clinical_knowledge/train-* - config_name: mmlu_college_biology data_files: - split: train path: mmlu_college_biology/train-* - config_name: mmlu_college_chemistry data_files: - split: train path: mmlu_college_chemistry/train-* - config_name: mmlu_college_computer_science data_files: - split: train path: mmlu_college_computer_science/train-* - config_name: mmlu_college_mathematics data_files: - split: train path: mmlu_college_mathematics/train-* - config_name: mmlu_college_medicine data_files: - split: train path: mmlu_college_medicine/train-* - config_name: mmlu_college_physics data_files: - split: train path: mmlu_college_physics/train-* - config_name: mmlu_computer_security data_files: - split: train path: mmlu_computer_security/train-* - config_name: mmlu_conceptual_physics data_files: - split: train path: mmlu_conceptual_physics/train-* - config_name: mmlu_econometrics data_files: - split: train path: mmlu_econometrics/train-* - config_name: mmlu_electrical_engineering data_files: - split: train path: mmlu_electrical_engineering/train-* - config_name: mmlu_elementary_mathematics data_files: - split: train path: mmlu_elementary_mathematics/train-* - config_name: mmlu_formal_logic data_files: - split: train path: mmlu_formal_logic/train-* - config_name: mmlu_global_facts data_files: - split: train path: mmlu_global_facts/train-* - config_name: mmlu_high_school_biology data_files: - split: train path: mmlu_high_school_biology/train-* - config_name: mmlu_high_school_chemistry data_files: - split: train path: mmlu_high_school_chemistry/train-* - config_name: mmlu_high_school_computer_science data_files: - split: train path: mmlu_high_school_computer_science/train-* - config_name: mmlu_high_school_european_history data_files: - split: train path: mmlu_high_school_european_history/train-* - config_name: mmlu_high_school_geography data_files: - split: train path: mmlu_high_school_geography/train-* - config_name: mmlu_high_school_government_and_politics data_files: - split: train path: mmlu_high_school_government_and_politics/train-* - config_name: mmlu_high_school_macroeconomics data_files: - split: train path: mmlu_high_school_macroeconomics/train-* - config_name: mmlu_high_school_mathematics data_files: - split: train path: mmlu_high_school_mathematics/train-* - config_name: mmlu_high_school_microeconomics data_files: - split: train path: mmlu_high_school_microeconomics/train-* - config_name: mmlu_high_school_physics data_files: - split: train path: mmlu_high_school_physics/train-* - config_name: mmlu_high_school_psychology data_files: - split: train path: mmlu_high_school_psychology/train-* - config_name: mmlu_high_school_statistics data_files: - split: train path: mmlu_high_school_statistics/train-* - config_name: mmlu_high_school_us_history data_files: - split: train path: mmlu_high_school_us_history/train-* - config_name: mmlu_high_school_world_history data_files: - split: train path: mmlu_high_school_world_history/train-* - config_name: mmlu_human_aging data_files: - split: train path: mmlu_human_aging/train-* - config_name: mmlu_human_sexuality data_files: - split: train path: mmlu_human_sexuality/train-* - config_name: mmlu_international_law data_files: - split: train path: mmlu_international_law/train-* - config_name: mmlu_jurisprudence data_files: - split: train path: mmlu_jurisprudence/train-* - config_name: mmlu_logical_fallacies data_files: - split: train path: mmlu_logical_fallacies/train-* - config_name: mmlu_machine_learning data_files: - split: train path: mmlu_machine_learning/train-* - config_name: mmlu_management data_files: - split: train path: mmlu_management/train-* - config_name: mmlu_marketing data_files: - split: train path: mmlu_marketing/train-* - config_name: mmlu_medical_genetics data_files: - split: train path: mmlu_medical_genetics/train-* - config_name: mmlu_miscellaneous data_files: - split: train path: mmlu_miscellaneous/train-* - config_name: mmlu_moral_disputes data_files: - split: train path: mmlu_moral_disputes/train-* - config_name: mmlu_moral_scenarios data_files: - split: train path: mmlu_moral_scenarios/train-* - config_name: mmlu_nutrition data_files: - split: train path: mmlu_nutrition/train-* - config_name: mmlu_philosophy data_files: - split: train path: mmlu_philosophy/train-* - config_name: mmlu_prehistory data_files: - split: train path: mmlu_prehistory/train-* - config_name: mmlu_professional_accounting data_files: - split: train path: mmlu_professional_accounting/train-* - config_name: mmlu_professional_law data_files: - split: train path: mmlu_professional_law/train-* - config_name: mmlu_professional_medicine data_files: - split: train path: mmlu_professional_medicine/train-* - config_name: mmlu_professional_psychology data_files: - split: train path: mmlu_professional_psychology/train-* - config_name: mmlu_public_relations data_files: - split: train path: mmlu_public_relations/train-* - config_name: mmlu_security_studies data_files: - split: train path: mmlu_security_studies/train-* - config_name: mmlu_sociology data_files: - split: train path: mmlu_sociology/train-* - config_name: mmlu_us_foreign_policy data_files: - split: train path: mmlu_us_foreign_policy/train-* - config_name: mmlu_virology data_files: - split: train path: mmlu_virology/train-* - config_name: mmlu_world_religions data_files: - split: train path: mmlu_world_religions/train-* - config_name: triviaqa data_files: - split: train path: triviaqa/train-* - config_name: winogrande data_files: - split: train path: winogrande/train-* ---
提供机构:
unlearning-cleanslate
搜集汇总
数据集介绍
main_image_url
构建方式
该数据集源于对OLMo-3-7B模型在RMU(Representation Misdirection for Unlearning)基线方法下的生成结果进行系统化采集与整理。构建过程以多个经典基准测试为数据母体,包括ARC Challenge和BBH(BIG-Bench Hard)系列中的多项子任务,如boolean_expressions、causal_judgement、date_understanding等。针对每个基准任务,设计特定的提示模板与生成参数(如温度、最大生成长度、是否采样),驱动模型产生原始响应,随后对响应进行过滤与评分,最终形成涵盖doc_id、原始文档、目标答案、生成参数、响应列表、过滤后响应及评估指标的结构化数据集。
特点
该数据集的核心特点在于其多维度、细粒度的结构化设计。每条样本不仅包含标准的问题与答案字段,还保存了完整的生成参数配置(如do_sample、max_gen_toks、temperature、until序列),为复现与分析模型行为提供了详尽上下文。尤为突出的是,针对不同任务类型,数据集灵活设计了差异化的字段结构,例如ARC Challenge中嵌套了答案选项与论证序列,而BBH任务则包含链式思考生成过程的完整记录。这种设计使得数据集既能支持模型输出的质量评估,也能服务于模型鲁棒性与对齐性的机理研究。
使用方法
该数据集适用于大语言模型安全对齐与遗忘学习的评估场景。研究者可通过加载指定配置的划分(如arc_challenge或bbh_cot_fewshot_boolean_expressions),利用其中的doc字段提取问题与标准答案,结合resps与filtered_resps字段分析模型生成质量。特别地,arguments字段中保存的原始生成参数可直接用于复现实验或调整推理策略。数据集还提供了score与metrics字段,便于直接计算模型在各子任务上的性能指标,加速模型遗忘效果与安全性的量化分析。
背景与挑战
背景概述
该数据集由Allen Institute for AI(AI2)创建,旨在评估和提升大语言模型(LLM)在多样化推理任务上的表现。generations-olmo-3-7b-rmu-baseline以OLMo-3-7B模型为基础,集成了代表性与无偏采样(RMU)基线方法,专注于解决大模型在常识推理(如ARC Challenge)、复杂逻辑推理(如BBH系列)以及多步算术等核心能力上的评测与微调。数据集收录了来自全球顶尖研究机构的贡献,自发布以来,为LLM的鲁棒性评估和可控生成研究提供了关键的基准资源,推动了自然语言处理领域对模型推理深度和稳定性的深入探索。
当前挑战
当前该数据集面临的核心挑战包括:领域问题层面,解决大语言模型在逻辑推理、因果判断和空间导航等复杂任务中普遍存在的幻觉与偏差问题,尤其是模型在多步推理链中的一致性难以保证。构建过程中,面临收集高质量、多样化的推理数据并确保其标注准确性的困难,例如BBH任务中的形式谬误与对象计数需要精细的人工校验;同时,处理模型生成响应时的过滤与打分机制(如filtered_resps和score字段)需兼顾效率与公正性,以避免噪声数据对微调效果的干扰。数据规模与任务覆盖的平衡也是一大难题,如何在保持数据集代表性的同时控制构建成本。
常用场景
经典使用场景
在大型语言模型安全与对齐研究领域,generations-olmo-3-7b-rmu-baseline数据集常被用于评估模型在去除不安全知识后的推理能力保持情况。该数据集整合了ARC-Challenge科学推理、BBH系列复杂逻辑推理(如布尔表达式、因果判断、日期理解、消歧问答等)多项经典基准,通过记录模型在原始输入下的生成响应及过滤后的输出,为衡量“表征机器遗忘”(Representation Machine Unlearning, RMU)技术对模型通用能力的潜在影响提供了标准化测试平台。研究者可利用该数据集对比模型在经历知识擦除前后,其在多维度推理任务上的表现变化。
衍生相关工作
基于该数据集的结构与设计理念,研究者已衍生出若干经典工作。例如,在“Representation Machine Unlearning”框架中,它被用作衡量遗忘方法效果的黄金标尺;后续工作如“Knowledge Suppression with Minimal Side Effects”借鉴了其多层过滤与多任务评估的设计,提出了更优的遗忘优化目标。此外,该数据集的响应收集范式也启发了“Harmful Knowledge Tracing”相关研究,催生了通过分析生成响应分布来定位模型内不安全知识存储位置的方法,极大丰富了模型可解释性与安全对齐的工具箱。
数据集最近研究
最新研究方向
该数据集聚焦于大语言模型在推理任务上的安全微调评估,特别是基于RMU(Representation Misdirection for Unlearning)方法对OLMo-3-7B基线模型实施知识遗忘后的生成能力验证。前沿研究方向涵盖模型在ARC挑战、BBH等复杂推理基准上的表现,通过多轮生成参数调控与过滤响应分析,衡量遗忘操作对模型泛化性与安全性的影响。该研究呼应了当前大模型对齐领域的热点——如何在不损害核心推理能力的前提下移除有害知识,平衡安全性与实用性的矛盾。data:image/png;base64,...
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作