five

unlearning-cleanslate/generations-llama-3_1-8b-rmu-baseline

收藏
Hugging Face2026-04-30 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/unlearning-cleanslate/generations-llama-3_1-8b-rmu-baseline
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: arc_challenge features: - name: doc_id dtype: int64 - name: doc struct: - name: answerKey dtype: string - name: choices struct: - name: label list: string - name: text list: string - name: id dtype: string - name: question dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_4 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 1903170 num_examples: 1172 download_size: 1728928 dataset_size: 1903170 - config_name: bbh_cot_fewshot_boolean_expressions features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 706341 num_examples: 250 download_size: 702310 dataset_size: 706341 - config_name: bbh_cot_fewshot_causal_judgement features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1216729 num_examples: 187 download_size: 1209367 dataset_size: 1216729 - config_name: bbh_cot_fewshot_date_understanding features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 527784 num_examples: 250 download_size: 515433 dataset_size: 527784 - config_name: bbh_cot_fewshot_disambiguation_qa features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1342491 num_examples: 250 download_size: 1343459 dataset_size: 1342491 - config_name: bbh_cot_fewshot_dyck_languages features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1370246 num_examples: 250 download_size: 1380918 dataset_size: 1370246 - config_name: bbh_cot_fewshot_formal_fallacies features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1771494 num_examples: 250 download_size: 1756510 dataset_size: 1771494 - config_name: bbh_cot_fewshot_geometric_shapes features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1727932 num_examples: 250 download_size: 1713698 dataset_size: 1727932 - config_name: bbh_cot_fewshot_hyperbaton features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1180398 num_examples: 250 download_size: 1178075 dataset_size: 1180398 - config_name: bbh_cot_fewshot_logical_deduction_five_objects features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1581650 num_examples: 250 download_size: 1596943 dataset_size: 1581650 - config_name: bbh_cot_fewshot_logical_deduction_seven_objects features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1681292 num_examples: 250 download_size: 1696755 dataset_size: 1681292 - config_name: bbh_cot_fewshot_logical_deduction_three_objects features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1240340 num_examples: 250 download_size: 1236203 dataset_size: 1240340 - config_name: bbh_cot_fewshot_movie_recommendation features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1214312 num_examples: 250 download_size: 1207825 dataset_size: 1214312 - config_name: bbh_cot_fewshot_multistep_arithmetic_two features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 959324 num_examples: 250 download_size: 965012 dataset_size: 959324 - config_name: bbh_cot_fewshot_navigate features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 984171 num_examples: 250 download_size: 981870 dataset_size: 984171 - config_name: bbh_cot_fewshot_object_counting features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 568151 num_examples: 250 download_size: 557165 dataset_size: 568151 - config_name: bbh_cot_fewshot_penguins_in_a_table features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 616770 num_examples: 146 download_size: 624523 dataset_size: 616770 - config_name: bbh_cot_fewshot_reasoning_about_colored_objects features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1453097 num_examples: 250 download_size: 1458340 dataset_size: 1453097 - config_name: bbh_cot_fewshot_ruin_names features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1225500 num_examples: 250 download_size: 1224081 dataset_size: 1225500 - config_name: bbh_cot_fewshot_salient_translation_error_detection features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 2377673 num_examples: 250 download_size: 2364988 dataset_size: 2377673 - config_name: bbh_cot_fewshot_snarks features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 905960 num_examples: 178 download_size: 911472 dataset_size: 905960 - config_name: bbh_cot_fewshot_sports_understanding features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 363491 num_examples: 250 download_size: 348239 dataset_size: 363491 - config_name: bbh_cot_fewshot_temporal_sequences features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1202509 num_examples: 250 download_size: 1200660 dataset_size: 1202509 - config_name: bbh_cot_fewshot_tracking_shuffled_objects_five_objects features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 2005089 num_examples: 250 download_size: 2013104 dataset_size: 2005089 - config_name: bbh_cot_fewshot_tracking_shuffled_objects_seven_objects features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 2075856 num_examples: 250 download_size: 2076299 dataset_size: 2075856 - config_name: bbh_cot_fewshot_tracking_shuffled_objects_three_objects features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1962088 num_examples: 250 download_size: 1972092 dataset_size: 1962088 - config_name: bbh_cot_fewshot_web_of_lies features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 1670962 num_examples: 250 download_size: 1679013 dataset_size: 1670962 - config_name: bbh_cot_fewshot_word_sorting features: - name: doc_id dtype: int64 - name: doc struct: - name: input dtype: string - name: target dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 995957 num_examples: 250 download_size: 1003265 dataset_size: 995957 - config_name: cleanslate_qa features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: string - name: content_id dtype: string - name: content_title dtype: string - name: question dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 8469118 num_examples: 12088 download_size: 7578514 dataset_size: 8469118 - config_name: coqa features: - name: doc_id dtype: int64 - name: doc struct: - name: additional_answers struct: - name: '0' struct: - name: input_text list: string - name: span_end list: int64 - name: span_start list: int64 - name: span_text list: string - name: turn_id list: int64 - name: '1' struct: - name: input_text list: string - name: span_end list: int64 - name: span_start list: int64 - name: span_text list: string - name: turn_id list: int64 - name: '2' struct: - name: input_text list: string - name: span_end list: int64 - name: span_start list: int64 - name: span_text list: string - name: turn_id list: int64 - name: answers struct: - name: input_text list: string - name: span_end list: int64 - name: span_start list: int64 - name: span_text list: string - name: turn_id list: int64 - name: id dtype: string - name: questions struct: - name: input_text list: string - name: turn_id list: int64 - name: source dtype: string - name: story dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: float64 - name: score dtype: float64 splits: - name: train num_bytes: 5613168 num_examples: 500 download_size: 5616657 dataset_size: 5613168 - config_name: drop features: - name: doc_id dtype: int64 - name: doc struct: - name: answer struct: - name: date struct: - name: day dtype: string - name: month dtype: string - name: year dtype: string - name: hit_id dtype: string - name: number dtype: string - name: spans list: string - name: worker_id dtype: string - name: answers list: list: string - name: id dtype: string - name: passage dtype: string - name: query_id dtype: string - name: question dtype: string - name: section_id dtype: string - name: validated_answers struct: - name: date list: - name: day dtype: string - name: month dtype: string - name: year dtype: string - name: hit_id list: string - name: number list: string - name: spans list: list: string - name: worker_id list: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 28369996 num_examples: 9536 download_size: 26625442 dataset_size: 28369996 - config_name: gsm8k features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: string - name: question dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 11676287 num_examples: 2638 download_size: 10741231 dataset_size: 11676287 - config_name: hellaswag features: - name: doc_id dtype: int64 - name: doc struct: - name: activity_label dtype: string - name: choices list: string - name: ctx dtype: string - name: ctx_a dtype: string - name: ctx_b dtype: string - name: endings list: string - name: gold dtype: int64 - name: ind dtype: int64 - name: label dtype: string - name: query dtype: string - name: source_id dtype: string - name: split dtype: string - name: split_type dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 39600701 num_examples: 10042 download_size: 38111417 dataset_size: 39600701 - config_name: humaneval_plus features: - name: doc_id dtype: int64 - name: doc struct: - name: canonical_solution dtype: string - name: entry_point dtype: string - name: prompt dtype: string - name: task_id dtype: string - name: test dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: max_gen_toks dtype: int64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: 'null' - name: score dtype: float64 splits: - name: train num_bytes: 22109834 num_examples: 164 download_size: 14104328 dataset_size: 22109834 - config_name: lambada_openai features: - name: doc_id dtype: int64 - name: doc struct: - name: text dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 5114571 num_examples: 5153 download_size: 4752588 dataset_size: 5114571 - config_name: mmlu_abstract_algebra features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 187438 num_examples: 100 download_size: 189149 dataset_size: 187438 - config_name: mmlu_anatomy features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 282175 num_examples: 135 download_size: 280343 dataset_size: 282175 - config_name: mmlu_astronomy features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 366567 num_examples: 152 download_size: 363695 dataset_size: 366567 - config_name: mmlu_business_ethics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 256476 num_examples: 100 download_size: 259844 dataset_size: 256476 - config_name: mmlu_clinical_knowledge features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 557207 num_examples: 265 download_size: 535474 dataset_size: 557207 - config_name: mmlu_college_biology features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 374025 num_examples: 144 download_size: 370919 dataset_size: 374025 - config_name: mmlu_college_chemistry features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 214842 num_examples: 100 download_size: 219165 dataset_size: 214842 - config_name: mmlu_college_computer_science features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 308057 num_examples: 100 download_size: 316889 dataset_size: 308057 - config_name: mmlu_college_mathematics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 215843 num_examples: 100 download_size: 217204 dataset_size: 215843 - config_name: mmlu_college_medicine features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 569134 num_examples: 173 download_size: 564596 dataset_size: 569134 - config_name: mmlu_college_physics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 242983 num_examples: 102 download_size: 246009 dataset_size: 242983 - config_name: mmlu_computer_security features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 226928 num_examples: 100 download_size: 228481 dataset_size: 226928 - config_name: mmlu_conceptual_physics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 419382 num_examples: 235 download_size: 400386 dataset_size: 419382 - config_name: mmlu_econometrics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 333995 num_examples: 114 download_size: 335305 dataset_size: 333995 - config_name: mmlu_electrical_engineering features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 261759 num_examples: 145 download_size: 255809 dataset_size: 261759 - config_name: mmlu_elementary_mathematics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 705110 num_examples: 378 download_size: 661769 dataset_size: 705110 - config_name: mmlu_formal_logic features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 360549 num_examples: 126 download_size: 361698 dataset_size: 360549 - config_name: mmlu_global_facts features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 180809 num_examples: 100 download_size: 182021 dataset_size: 180809 - config_name: mmlu_high_school_biology features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 834932 num_examples: 310 download_size: 806681 dataset_size: 834932 - config_name: mmlu_high_school_chemistry features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 481786 num_examples: 203 download_size: 468066 dataset_size: 481786 - config_name: mmlu_high_school_computer_science features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 319232 num_examples: 100 download_size: 325568 dataset_size: 319232 - config_name: mmlu_high_school_european_history features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 1511167 num_examples: 165 download_size: 1524803 dataset_size: 1511167 - config_name: mmlu_high_school_geography features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 394998 num_examples: 198 download_size: 381612 dataset_size: 394998 - config_name: mmlu_high_school_government_and_politics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 523894 num_examples: 193 download_size: 513460 dataset_size: 523894 - config_name: mmlu_high_school_macroeconomics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 962165 num_examples: 390 download_size: 921972 dataset_size: 962165 - config_name: mmlu_high_school_mathematics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 528926 num_examples: 270 download_size: 507424 dataset_size: 528926 - config_name: mmlu_high_school_microeconomics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 606523 num_examples: 238 download_size: 588969 dataset_size: 606523 - config_name: mmlu_high_school_physics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 437074 num_examples: 151 download_size: 439681 dataset_size: 437074 - config_name: mmlu_high_school_psychology features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 1308426 num_examples: 545 download_size: 1241277 dataset_size: 1308426 - config_name: mmlu_high_school_statistics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 756142 num_examples: 216 download_size: 743475 dataset_size: 756142 - config_name: mmlu_high_school_us_history features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 1675058 num_examples: 204 download_size: 1686210 dataset_size: 1675058 - config_name: mmlu_high_school_world_history features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 2118939 num_examples: 237 download_size: 2123836 dataset_size: 2118939 - config_name: mmlu_human_aging features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 427454 num_examples: 223 download_size: 410349 dataset_size: 427454 - config_name: mmlu_human_sexuality features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 278910 num_examples: 131 download_size: 276071 dataset_size: 278910 - config_name: mmlu_international_law features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 378137 num_examples: 121 download_size: 383785 dataset_size: 378137 - config_name: mmlu_jurisprudence features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 266424 num_examples: 108 download_size: 266305 dataset_size: 266424 - config_name: mmlu_logical_fallacies features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 399431 num_examples: 163 download_size: 394149 dataset_size: 399431 - config_name: mmlu_machine_learning features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 271052 num_examples: 112 download_size: 270894 dataset_size: 271052 - config_name: mmlu_management features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 190445 num_examples: 103 download_size: 191194 dataset_size: 190445 - config_name: mmlu_marketing features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 518947 num_examples: 234 download_size: 502298 dataset_size: 518947 - config_name: mmlu_medical_genetics features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 195142 num_examples: 100 download_size: 198668 dataset_size: 195142 - config_name: mmlu_miscellaneous features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 1437992 num_examples: 783 download_size: 1340333 dataset_size: 1437992 - config_name: mmlu_moral_disputes features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 848536 num_examples: 346 download_size: 813638 dataset_size: 848536 - config_name: mmlu_moral_scenarios features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 2678086 num_examples: 895 download_size: 2562326 dataset_size: 2678086 - config_name: mmlu_nutrition features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 728166 num_examples: 306 download_size: 702924 dataset_size: 728166 - config_name: mmlu_philosophy features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 673438 num_examples: 311 download_size: 646046 dataset_size: 673438 - config_name: mmlu_prehistory features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 732380 num_examples: 324 download_size: 702440 dataset_size: 732380 - config_name: mmlu_professional_accounting features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 888804 num_examples: 282 download_size: 865441 dataset_size: 888804 - config_name: mmlu_professional_law features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 10852346 num_examples: 1534 download_size: 10727658 dataset_size: 10852346 - config_name: mmlu_professional_medicine features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 1341755 num_examples: 272 download_size: 1337176 dataset_size: 1341755 - config_name: mmlu_professional_psychology features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 1706923 num_examples: 612 download_size: 1633314 dataset_size: 1706923 - config_name: mmlu_public_relations features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 243650 num_examples: 110 download_size: 244661 dataset_size: 243650 - config_name: mmlu_security_studies features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 1246781 num_examples: 245 download_size: 1235527 dataset_size: 1246781 - config_name: mmlu_sociology features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 506613 num_examples: 201 download_size: 493942 dataset_size: 506613 - config_name: mmlu_us_foreign_policy features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 233435 num_examples: 100 download_size: 235047 dataset_size: 233435 - config_name: mmlu_virology features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 337919 num_examples: 166 download_size: 331716 dataset_size: 337919 - config_name: mmlu_world_religions features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: int64 - name: choices list: string - name: question dtype: string - name: subject dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_2 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_3 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 280368 num_examples: 171 download_size: 270595 dataset_size: 280368 - config_name: triviaqa features: - name: doc_id dtype: int64 - name: doc struct: - name: answer struct: - name: aliases list: string - name: matched_wiki_entity_name dtype: string - name: normalized_aliases list: string - name: normalized_matched_wiki_entity_name dtype: string - name: normalized_value dtype: string - name: type dtype: string - name: value dtype: string - name: entity_pages struct: - name: doc_source list: 'null' - name: filename list: 'null' - name: title list: 'null' - name: wiki_context list: 'null' - name: question dtype: string - name: question_id dtype: string - name: question_source dtype: string - name: search_results struct: - name: description list: 'null' - name: filename list: 'null' - name: rank list: 'null' - name: search_context list: 'null' - name: title list: 'null' - name: url list: 'null' - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 struct: - name: do_sample dtype: bool - name: temperature dtype: float64 - name: until list: string - name: resps list: list: string - name: filtered_resps list: string - name: filter dtype: string - name: metrics list: string - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: bypass dtype: float64 - name: score dtype: float64 splits: - name: train num_bytes: 27527998 num_examples: 17944 download_size: 20805416 dataset_size: 27527998 - config_name: winogrande features: - name: doc_id dtype: int64 - name: doc struct: - name: answer dtype: string - name: option1 dtype: string - name: option2 dtype: string - name: sentence dtype: string - name: target dtype: string - name: arguments struct: - name: gen_args_0 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: gen_args_1 struct: - name: arg_0 dtype: string - name: arg_1 dtype: string - name: resps list: list: list: string - name: filtered_resps list: list: string - name: filter dtype: string - name: metrics list: 'null' - name: doc_hash dtype: string - name: prompt_hash dtype: string - name: target_hash dtype: string - name: score dtype: float64 splits: - name: train num_bytes: 981613 num_examples: 1267 download_size: 884522 dataset_size: 981613 configs: - config_name: arc_challenge data_files: - split: train path: arc_challenge/train-* - config_name: bbh_cot_fewshot_boolean_expressions data_files: - split: train path: bbh_cot_fewshot_boolean_expressions/train-* - config_name: bbh_cot_fewshot_causal_judgement data_files: - split: train path: bbh_cot_fewshot_causal_judgement/train-* - config_name: bbh_cot_fewshot_date_understanding data_files: - split: train path: bbh_cot_fewshot_date_understanding/train-* - config_name: bbh_cot_fewshot_disambiguation_qa data_files: - split: train path: bbh_cot_fewshot_disambiguation_qa/train-* - config_name: bbh_cot_fewshot_dyck_languages data_files: - split: train path: bbh_cot_fewshot_dyck_languages/train-* - config_name: bbh_cot_fewshot_formal_fallacies data_files: - split: train path: bbh_cot_fewshot_formal_fallacies/train-* - config_name: bbh_cot_fewshot_geometric_shapes data_files: - split: train path: bbh_cot_fewshot_geometric_shapes/train-* - config_name: bbh_cot_fewshot_hyperbaton data_files: - split: train path: bbh_cot_fewshot_hyperbaton/train-* - config_name: bbh_cot_fewshot_logical_deduction_five_objects data_files: - split: train path: bbh_cot_fewshot_logical_deduction_five_objects/train-* - config_name: bbh_cot_fewshot_logical_deduction_seven_objects data_files: - split: train path: bbh_cot_fewshot_logical_deduction_seven_objects/train-* - config_name: bbh_cot_fewshot_logical_deduction_three_objects data_files: - split: train path: bbh_cot_fewshot_logical_deduction_three_objects/train-* - config_name: bbh_cot_fewshot_movie_recommendation data_files: - split: train path: bbh_cot_fewshot_movie_recommendation/train-* - config_name: bbh_cot_fewshot_multistep_arithmetic_two data_files: - split: train path: bbh_cot_fewshot_multistep_arithmetic_two/train-* - config_name: bbh_cot_fewshot_navigate data_files: - split: train path: bbh_cot_fewshot_navigate/train-* - config_name: bbh_cot_fewshot_object_counting data_files: - split: train path: bbh_cot_fewshot_object_counting/train-* - config_name: bbh_cot_fewshot_penguins_in_a_table data_files: - split: train path: bbh_cot_fewshot_penguins_in_a_table/train-* - config_name: bbh_cot_fewshot_reasoning_about_colored_objects data_files: - split: train path: bbh_cot_fewshot_reasoning_about_colored_objects/train-* - config_name: bbh_cot_fewshot_ruin_names data_files: - split: train path: bbh_cot_fewshot_ruin_names/train-* - config_name: bbh_cot_fewshot_salient_translation_error_detection data_files: - split: train path: bbh_cot_fewshot_salient_translation_error_detection/train-* - config_name: bbh_cot_fewshot_snarks data_files: - split: train path: bbh_cot_fewshot_snarks/train-* - config_name: bbh_cot_fewshot_sports_understanding data_files: - split: train path: bbh_cot_fewshot_sports_understanding/train-* - config_name: bbh_cot_fewshot_temporal_sequences data_files: - split: train path: bbh_cot_fewshot_temporal_sequences/train-* - config_name: bbh_cot_fewshot_tracking_shuffled_objects_five_objects data_files: - split: train path: bbh_cot_fewshot_tracking_shuffled_objects_five_objects/train-* - config_name: bbh_cot_fewshot_tracking_shuffled_objects_seven_objects data_files: - split: train path: bbh_cot_fewshot_tracking_shuffled_objects_seven_objects/train-* - config_name: bbh_cot_fewshot_tracking_shuffled_objects_three_objects data_files: - split: train path: bbh_cot_fewshot_tracking_shuffled_objects_three_objects/train-* - config_name: bbh_cot_fewshot_web_of_lies data_files: - split: train path: bbh_cot_fewshot_web_of_lies/train-* - config_name: bbh_cot_fewshot_word_sorting data_files: - split: train path: bbh_cot_fewshot_word_sorting/train-* - config_name: cleanslate_qa data_files: - split: train path: cleanslate_qa/train-* - config_name: coqa data_files: - split: train path: coqa/train-* - config_name: drop data_files: - split: train path: drop/train-* - config_name: gsm8k data_files: - split: train path: gsm8k/train-* - config_name: hellaswag data_files: - split: train path: hellaswag/train-* - config_name: humaneval_plus data_files: - split: train path: humaneval_plus/train-* - config_name: lambada_openai data_files: - split: train path: lambada_openai/train-* - config_name: mmlu_abstract_algebra data_files: - split: train path: mmlu_abstract_algebra/train-* - config_name: mmlu_anatomy data_files: - split: train path: mmlu_anatomy/train-* - config_name: mmlu_astronomy data_files: - split: train path: mmlu_astronomy/train-* - config_name: mmlu_business_ethics data_files: - split: train path: mmlu_business_ethics/train-* - config_name: mmlu_clinical_knowledge data_files: - split: train path: mmlu_clinical_knowledge/train-* - config_name: mmlu_college_biology data_files: - split: train path: mmlu_college_biology/train-* - config_name: mmlu_college_chemistry data_files: - split: train path: mmlu_college_chemistry/train-* - config_name: mmlu_college_computer_science data_files: - split: train path: mmlu_college_computer_science/train-* - config_name: mmlu_college_mathematics data_files: - split: train path: mmlu_college_mathematics/train-* - config_name: mmlu_college_medicine data_files: - split: train path: mmlu_college_medicine/train-* - config_name: mmlu_college_physics data_files: - split: train path: mmlu_college_physics/train-* - config_name: mmlu_computer_security data_files: - split: train path: mmlu_computer_security/train-* - config_name: mmlu_conceptual_physics data_files: - split: train path: mmlu_conceptual_physics/train-* - config_name: mmlu_econometrics data_files: - split: train path: mmlu_econometrics/train-* - config_name: mmlu_electrical_engineering data_files: - split: train path: mmlu_electrical_engineering/train-* - config_name: mmlu_elementary_mathematics data_files: - split: train path: mmlu_elementary_mathematics/train-* - config_name: mmlu_formal_logic data_files: - split: train path: mmlu_formal_logic/train-* - config_name: mmlu_global_facts data_files: - split: train path: mmlu_global_facts/train-* - config_name: mmlu_high_school_biology data_files: - split: train path: mmlu_high_school_biology/train-* - config_name: mmlu_high_school_chemistry data_files: - split: train path: mmlu_high_school_chemistry/train-* - config_name: mmlu_high_school_computer_science data_files: - split: train path: mmlu_high_school_computer_science/train-* - config_name: mmlu_high_school_european_history data_files: - split: train path: mmlu_high_school_european_history/train-* - config_name: mmlu_high_school_geography data_files: - split: train path: mmlu_high_school_geography/train-* - config_name: mmlu_high_school_government_and_politics data_files: - split: train path: mmlu_high_school_government_and_politics/train-* - config_name: mmlu_high_school_macroeconomics data_files: - split: train path: mmlu_high_school_macroeconomics/train-* - config_name: mmlu_high_school_mathematics data_files: - split: train path: mmlu_high_school_mathematics/train-* - config_name: mmlu_high_school_microeconomics data_files: - split: train path: mmlu_high_school_microeconomics/train-* - config_name: mmlu_high_school_physics data_files: - split: train path: mmlu_high_school_physics/train-* - config_name: mmlu_high_school_psychology data_files: - split: train path: mmlu_high_school_psychology/train-* - config_name: mmlu_high_school_statistics data_files: - split: train path: mmlu_high_school_statistics/train-* - config_name: mmlu_high_school_us_history data_files: - split: train path: mmlu_high_school_us_history/train-* - config_name: mmlu_high_school_world_history data_files: - split: train path: mmlu_high_school_world_history/train-* - config_name: mmlu_human_aging data_files: - split: train path: mmlu_human_aging/train-* - config_name: mmlu_human_sexuality data_files: - split: train path: mmlu_human_sexuality/train-* - config_name: mmlu_international_law data_files: - split: train path: mmlu_international_law/train-* - config_name: mmlu_jurisprudence data_files: - split: train path: mmlu_jurisprudence/train-* - config_name: mmlu_logical_fallacies data_files: - split: train path: mmlu_logical_fallacies/train-* - config_name: mmlu_machine_learning data_files: - split: train path: mmlu_machine_learning/train-* - config_name: mmlu_management data_files: - split: train path: mmlu_management/train-* - config_name: mmlu_marketing data_files: - split: train path: mmlu_marketing/train-* - config_name: mmlu_medical_genetics data_files: - split: train path: mmlu_medical_genetics/train-* - config_name: mmlu_miscellaneous data_files: - split: train path: mmlu_miscellaneous/train-* - config_name: mmlu_moral_disputes data_files: - split: train path: mmlu_moral_disputes/train-* - config_name: mmlu_moral_scenarios data_files: - split: train path: mmlu_moral_scenarios/train-* - config_name: mmlu_nutrition data_files: - split: train path: mmlu_nutrition/train-* - config_name: mmlu_philosophy data_files: - split: train path: mmlu_philosophy/train-* - config_name: mmlu_prehistory data_files: - split: train path: mmlu_prehistory/train-* - config_name: mmlu_professional_accounting data_files: - split: train path: mmlu_professional_accounting/train-* - config_name: mmlu_professional_law data_files: - split: train path: mmlu_professional_law/train-* - config_name: mmlu_professional_medicine data_files: - split: train path: mmlu_professional_medicine/train-* - config_name: mmlu_professional_psychology data_files: - split: train path: mmlu_professional_psychology/train-* - config_name: mmlu_public_relations data_files: - split: train path: mmlu_public_relations/train-* - config_name: mmlu_security_studies data_files: - split: train path: mmlu_security_studies/train-* - config_name: mmlu_sociology data_files: - split: train path: mmlu_sociology/train-* - config_name: mmlu_us_foreign_policy data_files: - split: train path: mmlu_us_foreign_policy/train-* - config_name: mmlu_virology data_files: - split: train path: mmlu_virology/train-* - config_name: mmlu_world_religions data_files: - split: train path: mmlu_world_religions/train-* - config_name: triviaqa data_files: - split: train path: triviaqa/train-* - config_name: winogrande data_files: - split: train path: winogrande/train-* ---
提供机构:
unlearning-cleanslate
搜集汇总
数据集介绍
main_image_url
构建方式
该数据集基于Llama-3.1-8B模型在RMU基线方法下的生成结果构建而成,涵盖了多个经典的自然语言理解与推理子任务,如ARC-Challenge、BBH中的布尔表达式、因果判断、日期理解、歧义问答、几何形状推理等。每个子任务均以单独配置(config_name)存储,特征设计包含原始文档(doc)、模型生成响应(resps)、过滤后响应(filtered_resps)以及评分(score)等字段。数据集的构建过程首先从各基准测试中抽取原始问题与目标答案,随后利用模型在特定生成参数(如do_sample、max_gen_toks、temperature)下进行推理生成,最后通过过滤与评分机制对生成结果进行筛选与评估。
特点
该数据集最显著的特征在于其多任务、多配置的组织结构,每个子任务配置均独立保存,便于研究者针对特定能力维度进行精细分析。数据集中不仅保留了模型的原始生成结果(resps),还提供了过滤后的版本,使得研究者能够对比生成质量与过滤效果。此外,每一条数据均包含详细的元数据,如文档哈希(doc_hash)、提示哈希(prompt_hash)与目标哈希(target_hash),确保了数据可溯源性与实验的可复现性。评分字段(score)进一步量化了模型在每一任务上的表现,为安全对齐研究提供了直接的价值判断依据。
使用方法
用户可通过HuggingFace Datasets库加载该数据集,使用load_dataset函数并指定配置名称(config_name)即可获取对应子任务的数据。加载后的数据集以标准格式呈现,每条数据包含原始问题、模型生成响应、过滤结果及评分等字段。适用于大语言模型的安全对齐研究、生成质量评估、多任务推理能力分析等场景。研究者可基于过滤后的响应进行模型行为分析,或利用评分字段进行跨任务性能对比,从而深入理解模型在安全对齐训练后的能力变化与潜在风险。
背景与挑战
背景概述
该数据集名为“generations-llama-3_1-8b-rmu-baseline”,由相关研究机构创建,旨在为大型语言模型的安全性与鲁棒性评估提供基准数据。随着以LLaMA-3.1-8B为代表的大规模语言模型在自然语言处理领域展现出卓越能力,其潜在的安全风险与不可控行为日益引发学界关注。该数据集聚焦于模型在多样化的推理任务(如BBH、ARC-Challenge等)中的生成行为,通过系统性地收集模型对各类提示的响应,为后续的模型微调与安全对齐研究奠定数据基础。自发布以来,该数据集已成为评估语言模型移除有害知识(RMU)基线方法效果的重要资源,对推动大模型安全领域的发展具有显著影响力。
当前挑战
该数据集面临的核心挑战在于如何有效评估并缓解大语言模型在推理任务中的有害知识保留问题。一方面,模型在多步骤逻辑推理、常识问答等任务中可能输出歧视性、虚假或有害内容,而传统安全对齐方法常难以兼顾推理能力与安全性。另一方面,数据集构建过程中需精心设计涵盖多元领域的提示模板,并确保生成样本的多样性与代表性,同时克服大模型生成结果的不确定性带来的标注与过滤难题。此外,如何权衡模型在保持高推理准确率的同时摒弃有害知识,仍是一个亟待突破的技术瓶颈。
常用场景
经典使用场景
在大型语言模型的安全性与可靠性研究中,generations-llama-3_1-8b-rmu-baseline数据集为评估模型在面对对抗性输入时的生成质量提供了系统性的测试基准。该数据集涵盖多元化的推理任务,如常识问答、逻辑演绎、数学计算与空间导航等,研究者可借此考察模型在多样场景下的响应一致性与准确性。通过分析模型在标准提示下生成的输出,能够深入识别其在理解复杂指令、避免产生有害或不合逻辑内容方面的能力边界,从而为改进模型的安全机制提供实证支撑。
衍生相关工作
基于该数据集衍生的相关工作主要集中在对模型安全对齐方法的深化与拓展上。其中最具代表性的当属RMU(Representation Misdirection for Unlearning)算法的提出与应用,该技术通过调整模型内部表示来实现特定知识的选择性遗忘。后续研究进一步结合该数据集探索了多轮对话场景下的遗忘效果评估,以及将对抗性训练与安全微调相融合的混合策略。此外,也有工作将其作为评估基线,对比不同去学习算法在保留模型推理能力与消除敏感信息之间的权衡,从而推动了大模型安全领域方法论的系统性发展。
数据集最近研究
最新研究方向
该数据集聚焦于大语言模型在推理与知识回溯场景下的稳健性评估,尤其围绕RMU(Representation Misdirection via Unlearning)基线方法对Llama 3.1 8B模型生成行为的调校效果。当前前沿研究方向包括:利用多领域推理基准(如ARC-Challenge与BBH系列)系统测量模型在算术、逻辑演绎、常识因果判断等任务中的输出质量与一致性,同时结合生成参数(如do_sample、temperature)与过滤机制,探究模型在复杂提示下的抗干扰与可控生成能力。这一方向与AI安全领域中对模型遗忘、幻觉抑制及对齐性评估的热点紧密相连,为构建更可信、可审计的生成式语言系统提供了关键的数据支撑与方法论基石。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作