DarianNLP/mda_influence_scores_NEW_lr1e4_v2
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/DarianNLP/mda_influence_scores_NEW_lr1e4_v2
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: idx
dtype: int64
- name: prompt
dtype: string
- name: label
dtype: string
- name: source
dtype: string
- name: response
dtype: string
- name: grad_norm
dtype: float64
- name: delta_Y_mean
dtype: float64
- name: delta_Y_per_prompt
list: float64
- name: delta_Y_frac_increasing
dtype: float64
- name: delta_h10_mean_norm
dtype: float64
- name: delta_f_per_prompt
struct:
- name: 10000_neutral_factual_general_knowledge_and_email_writing_prompts_harmless
list: float64
- name: 10737_general_benign_informational_explanation_requests_harmless
list: float64
- name: 1083_trivia_lookup_and_benign_productivity_queries_harmless
list: float64
- name: 10878_stereotype_based_demeaning_character_portrayals_harmful
list: float64
- name: 10940_stereotyping_protected_groups_as_inferior_harmful
list: float64
- name: 11223_competitive_skill_improvement_metaphors_harmless
list: float64
- name: 11236_degrading_prejudice_toward_marginalized_groups_harmful
list: float64
- name: 11382_copyright_compliance_and_policy_clarification_requests_harmless
list: float64
- name: 11404_copyright_derivative_work_permission_requests_harmful
list: float64
- name: 11495_creative_ideation_and_invention_prompts_harmless
list: float64
- name: 11634_privacy_identity_check_and_harmless_business_inquiry_requests_harmless
list: float64
- name: 11756_copyrighted_media_permission_and_attribution_requests_harmless
list: float64
- name: 11813_toxic_social_bias_discriminatory_harm_requests_harmless
list: float64
- name: 11840_corporate_email_admin_document_text_harmful
list: float64
- name: 11861_anti_fraud_and_verification_requests_harmless
list: float64
- name: 12232_general_benign_informational_question_answering_harmless
list: float64
- name: 1227_illicit_finance_fraud_evasion_requests_harmful
list: float64
- name: 1636_privacy_intrusion_data_request_harmful
list: float64
- name: 1728_general_purpose_benign_inquiry_informational_request_harmless
list: float64
- name: 1742_mundane_creative_editing_and_trivia_requests_harmless
list: float64
- name: 1748_comparison_framed_social_cultural_difference_questions_harmless
list: float64
- name: 1923_harmful_persuasion_manipulation_requests_harmful
list: float64
- name: 1986_violent_sadistic_scene_requests_harmful
list: float64
- name: 2197_manipulative_abuse_grooming_solicitation_harmful
list: float64
- name: 2286_email_document_subject_line_extraction_harmless
list: float64
- name: 2684_corporate_email_attachment_logistics_harmful
list: float64
- name: 2742_email_document_revision_and_legal_admin_text_harmful
list: float64
- name: 2827_neutral_procedural_text_with_risky_content_overlap_harmless
list: float64
- name: 286_email_subject_line_document_summarization_requests_harmless
list: float64
- name: 2945_internal_business_email_logistics_harmless
list: float64
- name: 3248_deceptive_harm_fraud_hoaxes_and_misinformation_harmful
list: float64
- name: 3277_practical_everyday_advice_problem_solving_requests_harmless
list: float64
- name: 3368_graphic_violent_ritual_injury_descriptions_harmless
list: float64
- name: 3734_discriminatory_insult_harassment_requests_harmful
list: float64
- name: 3767_copyright_safe_creative_and_formatting_requests_weak_harmless
list: float64
- name: 3772_biased_demeaning_requests_about_protected_groups_harmful
list: float64
- name: 3868_address_and_contact_lookup_requests_for_institutions_and_fictional_entities_harmless
list: float64
- name: 3895_general_factual_cultural_trivia_questions_harmless
list: float64
- name: 3915_abuse_harassment_and_humiliating_harm_requests_harmful
list: float64
- name: 4112_historical_causation_of_genocides_and_revolutions_harmless
list: float64
- name: 4202_sensitive_info_policy_complaint_requests_harmless
list: float64
- name: 4205_coping_and_self_improvement_guidance_harmless
list: float64
- name: 4319_general_benign_informational_queries_harmless
list: float64
- name: 4396_public_figure_medical_history_inquiries_harmless
list: float64
- name: 4590_stereotyping_and_generalization_prompts_harmless
list: float64
- name: 4986_ethical_classification_recommendation_task_prompts_harmless
list: float64
- name: 5098_corporate_email_admin_task_requests_weak_harmful
list: float64
- name: 5449_rumor_seeking_and_secret_sharing_requests_harmful
list: float64
- name: 5509_grammar_style_transformation_and_neutral_writing_prompts_harmless
list: float64
- name: 603_biased_inferiority_and_illicit_trade_harm_requests_harmful
list: float64
- name: 6213_email_admin_subject_line_drafting_harmless
list: float64
- name: 6238_definition_of_risky_terms_queries_harmless
list: float64
- name: 6499_generic_benign_writing_tasks_harmless
list: float64
- name: 6562_legitimate_verification_and_administrative_request_prompts_harmless
list: float64
- name: 6733_copyright_defamation_misuse_requests_harmful
list: float64
- name: 6815_harmful_advocacy_with_deceptive_framing_harmful
list: float64
- name: 7063_sensitive_social_organizational_explanation_requests_harmless
list: float64
- name: 7261_protective_safety_legal_compliance_advice_harmless
list: float64
- name: 728_safety_health_advice_and_benign_information_requests_harmless
list: float64
- name: 7294_deceptive_malicious_manipulation_requests_harmful
list: float64
- name: 7480_email_subject_line_and_message_summary_requests_harmless
list: float64
- name: 7575_factual_summary_explanation_requests_about_benign_media_and_history_harmless
list: float64
- name: 7897_administrative_verification_request_language_harmless
list: float64
- name: 7940_defamation_and_fake_news_fabrication_requests_harmful
list: float64
- name: 7971_fictional_scenario_safety_secure_escape_questions_harmless
list: float64
- name: 8063_general_factual_cultural_benign_queries_harmless
list: float64
- name: 8140_email_task_management_correspondence_harmless
list: float64
- name: 8184_risk_avoidance_verification_requests_harmless
list: float64
- name: 823_basic_arithmetic_and_sequence_number_pattern_problems_harmless
list: float64
- name: 8240_trivia_and_classification_questions_harmless
list: float64
- name: 910_corporate_email_meeting_logistics_harmless
list: float64
- name: 9267_video_game_combat_action_scenario_requests_harmless
list: float64
- name: 9558_word_list_sentence_puzzle_prompts_harmful
list: float64
- name: 9623_misinformation_and_fabricated_narrative_requests_harmful
list: float64
- name: 9994_enumerative_benign_business_science_prompts_harmless
list: float64
- name: harmful_natural_refusal_influence
dtype: float64
- name: harmful_natural_influence_10000_neutral_factual_general_knowledge_and_email_writing_prompts_harmless
dtype: float64
- name: harmful_natural_influence_10737_general_benign_informational_explanation_requests_harmless
dtype: float64
- name: harmful_natural_influence_1083_trivia_lookup_and_benign_productivity_queries_harmless
dtype: float64
- name: harmful_natural_influence_10878_stereotype_based_demeaning_character_portrayals_harmful
dtype: float64
- name: harmful_natural_influence_10940_stereotyping_protected_groups_as_inferior_harmful
dtype: float64
- name: harmful_natural_influence_11223_competitive_skill_improvement_metaphors_harmless
dtype: float64
- name: harmful_natural_influence_11236_degrading_prejudice_toward_marginalized_groups_harmful
dtype: float64
- name: harmful_natural_influence_11382_copyright_compliance_and_policy_clarification_requests_harmless
dtype: float64
- name: harmful_natural_influence_11404_copyright_derivative_work_permission_requests_harmful
dtype: float64
- name: harmful_natural_influence_11495_creative_ideation_and_invention_prompts_harmless
dtype: float64
- name: harmful_natural_influence_11634_privacy_identity_check_and_harmless_business_inquiry_requests_harmless
dtype: float64
- name: harmful_natural_influence_11756_copyrighted_media_permission_and_attribution_requests_harmless
dtype: float64
- name: harmful_natural_influence_11813_toxic_social_bias_discriminatory_harm_requests_harmless
dtype: float64
- name: harmful_natural_influence_11840_corporate_email_admin_document_text_harmful
dtype: float64
- name: harmful_natural_influence_11861_anti_fraud_and_verification_requests_harmless
dtype: float64
- name: harmful_natural_influence_12232_general_benign_informational_question_answering_harmless
dtype: float64
- name: harmful_natural_influence_1227_illicit_finance_fraud_evasion_requests_harmful
dtype: float64
- name: harmful_natural_influence_1636_privacy_intrusion_data_request_harmful
dtype: float64
- name: harmful_natural_influence_1728_general_purpose_benign_inquiry_informational_request_harmless
dtype: float64
- name: harmful_natural_influence_1742_mundane_creative_editing_and_trivia_requests_harmless
dtype: float64
- name: harmful_natural_influence_1748_comparison_framed_social_cultural_difference_questions_harmless
dtype: float64
- name: harmful_natural_influence_1923_harmful_persuasion_manipulation_requests_harmful
dtype: float64
- name: harmful_natural_influence_1986_violent_sadistic_scene_requests_harmful
dtype: float64
- name: harmful_natural_influence_2197_manipulative_abuse_grooming_solicitation_harmful
dtype: float64
- name: harmful_natural_influence_2286_email_document_subject_line_extraction_harmless
dtype: float64
- name: harmful_natural_influence_2684_corporate_email_attachment_logistics_harmful
dtype: float64
- name: harmful_natural_influence_2742_email_document_revision_and_legal_admin_text_harmful
dtype: float64
- name: harmful_natural_influence_2827_neutral_procedural_text_with_risky_content_overlap_harmless
dtype: float64
- name: harmful_natural_influence_286_email_subject_line_document_summarization_requests_harmless
dtype: float64
- name: harmful_natural_influence_2945_internal_business_email_logistics_harmless
dtype: float64
- name: harmful_natural_influence_3248_deceptive_harm_fraud_hoaxes_and_misinformation_harmful
dtype: float64
- name: harmful_natural_influence_3277_practical_everyday_advice_problem_solving_requests_harmless
dtype: float64
- name: harmful_natural_influence_3368_graphic_violent_ritual_injury_descriptions_harmless
dtype: float64
- name: harmful_natural_influence_3734_discriminatory_insult_harassment_requests_harmful
dtype: float64
- name: harmful_natural_influence_3767_copyright_safe_creative_and_formatting_requests_weak_harmless
dtype: float64
- name: harmful_natural_influence_3772_biased_demeaning_requests_about_protected_groups_harmful
dtype: float64
- name: harmful_natural_influence_3868_address_and_contact_lookup_requests_for_institutions_and_fictional_entities_harmless
dtype: float64
- name: harmful_natural_influence_3895_general_factual_cultural_trivia_questions_harmless
dtype: float64
- name: harmful_natural_influence_3915_abuse_harassment_and_humiliating_harm_requests_harmful
dtype: float64
- name: harmful_natural_influence_4112_historical_causation_of_genocides_and_revolutions_harmless
dtype: float64
- name: harmful_natural_influence_4202_sensitive_info_policy_complaint_requests_harmless
dtype: float64
- name: harmful_natural_influence_4205_coping_and_self_improvement_guidance_harmless
dtype: float64
- name: harmful_natural_influence_4319_general_benign_informational_queries_harmless
dtype: float64
- name: harmful_natural_influence_4396_public_figure_medical_history_inquiries_harmless
dtype: float64
- name: harmful_natural_influence_4590_stereotyping_and_generalization_prompts_harmless
dtype: float64
- name: harmful_natural_influence_4986_ethical_classification_recommendation_task_prompts_harmless
dtype: float64
- name: harmful_natural_influence_5098_corporate_email_admin_task_requests_weak_harmful
dtype: float64
- name: harmful_natural_influence_5449_rumor_seeking_and_secret_sharing_requests_harmful
dtype: float64
- name: harmful_natural_influence_5509_grammar_style_transformation_and_neutral_writing_prompts_harmless
dtype: float64
- name: harmful_natural_influence_603_biased_inferiority_and_illicit_trade_harm_requests_harmful
dtype: float64
- name: harmful_natural_influence_6213_email_admin_subject_line_drafting_harmless
dtype: float64
- name: harmful_natural_influence_6238_definition_of_risky_terms_queries_harmless
dtype: float64
- name: harmful_natural_influence_6499_generic_benign_writing_tasks_harmless
dtype: float64
- name: harmful_natural_influence_6562_legitimate_verification_and_administrative_request_prompts_harmless
dtype: float64
- name: harmful_natural_influence_6733_copyright_defamation_misuse_requests_harmful
dtype: float64
- name: harmful_natural_influence_6815_harmful_advocacy_with_deceptive_framing_harmful
dtype: float64
- name: harmful_natural_influence_7063_sensitive_social_organizational_explanation_requests_harmless
dtype: float64
- name: harmful_natural_influence_7261_protective_safety_legal_compliance_advice_harmless
dtype: float64
- name: harmful_natural_influence_728_safety_health_advice_and_benign_information_requests_harmless
dtype: float64
- name: harmful_natural_influence_7294_deceptive_malicious_manipulation_requests_harmful
dtype: float64
- name: harmful_natural_influence_7480_email_subject_line_and_message_summary_requests_harmless
dtype: float64
- name: harmful_natural_influence_7575_factual_summary_explanation_requests_about_benign_media_and_history_harmless
dtype: float64
- name: harmful_natural_influence_7897_administrative_verification_request_language_harmless
dtype: float64
- name: harmful_natural_influence_7940_defamation_and_fake_news_fabrication_requests_harmful
dtype: float64
- name: harmful_natural_influence_7971_fictional_scenario_safety_secure_escape_questions_harmless
dtype: float64
- name: harmful_natural_influence_8063_general_factual_cultural_benign_queries_harmless
dtype: float64
- name: harmful_natural_influence_8140_email_task_management_correspondence_harmless
dtype: float64
- name: harmful_natural_influence_8184_risk_avoidance_verification_requests_harmless
dtype: float64
- name: harmful_natural_influence_823_basic_arithmetic_and_sequence_number_pattern_problems_harmless
dtype: float64
- name: harmful_natural_influence_8240_trivia_and_classification_questions_harmless
dtype: float64
- name: harmful_natural_influence_910_corporate_email_meeting_logistics_harmless
dtype: float64
- name: harmful_natural_influence_9267_video_game_combat_action_scenario_requests_harmless
dtype: float64
- name: harmful_natural_influence_9558_word_list_sentence_puzzle_prompts_harmful
dtype: float64
- name: harmful_natural_influence_9623_misinformation_and_fabricated_narrative_requests_harmful
dtype: float64
- name: harmful_natural_influence_9994_enumerative_benign_business_science_prompts_harmless
dtype: float64
- name: harmful_natural_top5_most_influenced
list:
- name: feature
dtype: string
- name: influence
dtype: float64
- name: ridge_weight
dtype: float64
- name: harmful_natural_top5_most_important
list:
- name: feature
dtype: string
- name: influence
dtype: float64
- name: ridge_weight
dtype: float64
- name: harmful_balanced_refusal_influence
dtype: float64
- name: harmful_balanced_influence_10000_neutral_factual_general_knowledge_and_email_writing_prompts_harmless
dtype: float64
- name: harmful_balanced_influence_10737_general_benign_informational_explanation_requests_harmless
dtype: float64
- name: harmful_balanced_influence_1083_trivia_lookup_and_benign_productivity_queries_harmless
dtype: float64
- name: harmful_balanced_influence_10878_stereotype_based_demeaning_character_portrayals_harmful
dtype: float64
- name: harmful_balanced_influence_10940_stereotyping_protected_groups_as_inferior_harmful
dtype: float64
- name: harmful_balanced_influence_11223_competitive_skill_improvement_metaphors_harmless
dtype: float64
- name: harmful_balanced_influence_11236_degrading_prejudice_toward_marginalized_groups_harmful
dtype: float64
- name: harmful_balanced_influence_11382_copyright_compliance_and_policy_clarification_requests_harmless
dtype: float64
- name: harmful_balanced_influence_11404_copyright_derivative_work_permission_requests_harmful
dtype: float64
- name: harmful_balanced_influence_11495_creative_ideation_and_invention_prompts_harmless
dtype: float64
- name: harmful_balanced_influence_11634_privacy_identity_check_and_harmless_business_inquiry_requests_harmless
dtype: float64
- name: harmful_balanced_influence_11756_copyrighted_media_permission_and_attribution_requests_harmless
dtype: float64
- name: harmful_balanced_influence_11813_toxic_social_bias_discriminatory_harm_requests_harmless
dtype: float64
- name: harmful_balanced_influence_11840_corporate_email_admin_document_text_harmful
dtype: float64
- name: harmful_balanced_influence_11861_anti_fraud_and_verification_requests_harmless
dtype: float64
- name: harmful_balanced_influence_12232_general_benign_informational_question_answering_harmless
dtype: float64
- name: harmful_balanced_influence_1227_illicit_finance_fraud_evasion_requests_harmful
dtype: float64
- name: harmful_balanced_influence_1636_privacy_intrusion_data_request_harmful
dtype: float64
- name: harmful_balanced_influence_1728_general_purpose_benign_inquiry_informational_request_harmless
dtype: float64
- name: harmful_balanced_influence_1742_mundane_creative_editing_and_trivia_requests_harmless
dtype: float64
- name: harmful_balanced_influence_1748_comparison_framed_social_cultural_difference_questions_harmless
dtype: float64
- name: harmful_balanced_influence_1923_harmful_persuasion_manipulation_requests_harmful
dtype: float64
- name: harmful_balanced_influence_1986_violent_sadistic_scene_requests_harmful
dtype: float64
- name: harmful_balanced_influence_2197_manipulative_abuse_grooming_solicitation_harmful
dtype: float64
- name: harmful_balanced_influence_2286_email_document_subject_line_extraction_harmless
dtype: float64
- name: harmful_balanced_influence_2684_corporate_email_attachment_logistics_harmful
dtype: float64
- name: harmful_balanced_influence_2742_email_document_revision_and_legal_admin_text_harmful
dtype: float64
- name: harmful_balanced_influence_2827_neutral_procedural_text_with_risky_content_overlap_harmless
dtype: float64
- name: harmful_balanced_influence_286_email_subject_line_document_summarization_requests_harmless
dtype: float64
- name: harmful_balanced_influence_2945_internal_business_email_logistics_harmless
dtype: float64
- name: harmful_balanced_influence_3248_deceptive_harm_fraud_hoaxes_and_misinformation_harmful
dtype: float64
- name: harmful_balanced_influence_3277_practical_everyday_advice_problem_solving_requests_harmless
dtype: float64
- name: harmful_balanced_influence_3368_graphic_violent_ritual_injury_descriptions_harmless
dtype: float64
- name: harmful_balanced_influence_3734_discriminatory_insult_harassment_requests_harmful
dtype: float64
- name: harmful_balanced_influence_3767_copyright_safe_creative_and_formatting_requests_weak_harmless
dtype: float64
- name: harmful_balanced_influence_3772_biased_demeaning_requests_about_protected_groups_harmful
dtype: float64
- name: harmful_balanced_influence_3868_address_and_contact_lookup_requests_for_institutions_and_fictional_entities_harmless
dtype: float64
- name: harmful_balanced_influence_3895_general_factual_cultural_trivia_questions_harmless
dtype: float64
- name: harmful_balanced_influence_3915_abuse_harassment_and_humiliating_harm_requests_harmful
dtype: float64
- name: harmful_balanced_influence_4112_historical_causation_of_genocides_and_revolutions_harmless
dtype: float64
- name: harmful_balanced_influence_4202_sensitive_info_policy_complaint_requests_harmless
dtype: float64
- name: harmful_balanced_influence_4205_coping_and_self_improvement_guidance_harmless
dtype: float64
- name: harmful_balanced_influence_4319_general_benign_informational_queries_harmless
dtype: float64
- name: harmful_balanced_influence_4396_public_figure_medical_history_inquiries_harmless
dtype: float64
- name: harmful_balanced_influence_4590_stereotyping_and_generalization_prompts_harmless
dtype: float64
- name: harmful_balanced_influence_4986_ethical_classification_recommendation_task_prompts_harmless
dtype: float64
- name: harmful_balanced_influence_5098_corporate_email_admin_task_requests_weak_harmful
dtype: float64
- name: harmful_balanced_influence_5449_rumor_seeking_and_secret_sharing_requests_harmful
dtype: float64
- name: harmful_balanced_influence_5509_grammar_style_transformation_and_neutral_writing_prompts_harmless
dtype: float64
- name: harmful_balanced_influence_603_biased_inferiority_and_illicit_trade_harm_requests_harmful
dtype: float64
- name: harmful_balanced_influence_6213_email_admin_subject_line_drafting_harmless
dtype: float64
- name: harmful_balanced_influence_6238_definition_of_risky_terms_queries_harmless
dtype: float64
- name: harmful_balanced_influence_6499_generic_benign_writing_tasks_harmless
dtype: float64
- name: harmful_balanced_influence_6562_legitimate_verification_and_administrative_request_prompts_harmless
dtype: float64
- name: harmful_balanced_influence_6733_copyright_defamation_misuse_requests_harmful
dtype: float64
- name: harmful_balanced_influence_6815_harmful_advocacy_with_deceptive_framing_harmful
dtype: float64
- name: harmful_balanced_influence_7063_sensitive_social_organizational_explanation_requests_harmless
dtype: float64
- name: harmful_balanced_influence_7261_protective_safety_legal_compliance_advice_harmless
dtype: float64
- name: harmful_balanced_influence_728_safety_health_advice_and_benign_information_requests_harmless
dtype: float64
- name: harmful_balanced_influence_7294_deceptive_malicious_manipulation_requests_harmful
dtype: float64
- name: harmful_balanced_influence_7480_email_subject_line_and_message_summary_requests_harmless
dtype: float64
- name: harmful_balanced_influence_7575_factual_summary_explanation_requests_about_benign_media_and_history_harmless
dtype: float64
- name: harmful_balanced_influence_7897_administrative_verification_request_language_harmless
dtype: float64
- name: harmful_balanced_influence_7940_defamation_and_fake_news_fabrication_requests_harmful
dtype: float64
- name: harmful_balanced_influence_7971_fictional_scenario_safety_secure_escape_questions_harmless
dtype: float64
- name: harmful_balanced_influence_8063_general_factual_cultural_benign_queries_harmless
dtype: float64
- name: harmful_balanced_influence_8140_email_task_management_correspondence_harmless
dtype: float64
- name: harmful_balanced_influence_8184_risk_avoidance_verification_requests_harmless
dtype: float64
- name: harmful_balanced_influence_823_basic_arithmetic_and_sequence_number_pattern_problems_harmless
dtype: float64
- name: harmful_balanced_influence_8240_trivia_and_classification_questions_harmless
dtype: float64
- name: harmful_balanced_influence_910_corporate_email_meeting_logistics_harmless
dtype: float64
- name: harmful_balanced_influence_9267_video_game_combat_action_scenario_requests_harmless
dtype: float64
- name: harmful_balanced_influence_9558_word_list_sentence_puzzle_prompts_harmful
dtype: float64
- name: harmful_balanced_influence_9623_misinformation_and_fabricated_narrative_requests_harmful
dtype: float64
- name: harmful_balanced_influence_9994_enumerative_benign_business_science_prompts_harmless
dtype: float64
- name: harmful_balanced_top5_most_influenced
list:
- name: feature
dtype: string
- name: influence
dtype: float64
- name: ridge_weight
dtype: float64
- name: harmful_balanced_top5_most_important
list:
- name: feature
dtype: string
- name: influence
dtype: float64
- name: ridge_weight
dtype: float64
- name: harmful_harmless_refusal_influence
dtype: float64
- name: harmful_harmless_influence_10000_neutral_factual_general_knowledge_and_email_writing_prompts_harmless
dtype: float64
- name: harmful_harmless_influence_10737_general_benign_informational_explanation_requests_harmless
dtype: float64
- name: harmful_harmless_influence_1083_trivia_lookup_and_benign_productivity_queries_harmless
dtype: float64
- name: harmful_harmless_influence_10878_stereotype_based_demeaning_character_portrayals_harmful
dtype: float64
- name: harmful_harmless_influence_10940_stereotyping_protected_groups_as_inferior_harmful
dtype: float64
- name: harmful_harmless_influence_11223_competitive_skill_improvement_metaphors_harmless
dtype: float64
- name: harmful_harmless_influence_11236_degrading_prejudice_toward_marginalized_groups_harmful
dtype: float64
- name: harmful_harmless_influence_11382_copyright_compliance_and_policy_clarification_requests_harmless
dtype: float64
- name: harmful_harmless_influence_11404_copyright_derivative_work_permission_requests_harmful
dtype: float64
- name: harmful_harmless_influence_11495_creative_ideation_and_invention_prompts_harmless
dtype: float64
- name: harmful_harmless_influence_11634_privacy_identity_check_and_harmless_business_inquiry_requests_harmless
dtype: float64
- name: harmful_harmless_influence_11756_copyrighted_media_permission_and_attribution_requests_harmless
dtype: float64
- name: harmful_harmless_influence_11813_toxic_social_bias_discriminatory_harm_requests_harmless
dtype: float64
- name: harmful_harmless_influence_11840_corporate_email_admin_document_text_harmful
dtype: float64
- name: harmful_harmless_influence_11861_anti_fraud_and_verification_requests_harmless
dtype: float64
- name: harmful_harmless_influence_12232_general_benign_informational_question_answering_harmless
dtype: float64
- name: harmful_harmless_influence_1227_illicit_finance_fraud_evasion_requests_harmful
dtype: float64
- name: harmful_harmless_influence_1636_privacy_intrusion_data_request_harmful
dtype: float64
- name: harmful_harmless_influence_1728_general_purpose_benign_inquiry_informational_request_harmless
dtype: float64
- name: harmful_harmless_influence_1742_mundane_creative_editing_and_trivia_requests_harmless
dtype: float64
- name: harmful_harmless_influence_1748_comparison_framed_social_cultural_difference_questions_harmless
dtype: float64
- name: harmful_harmless_influence_1923_harmful_persuasion_manipulation_requests_harmful
dtype: float64
- name: harmful_harmless_influence_1986_violent_sadistic_scene_requests_harmful
dtype: float64
- name: harmful_harmless_influence_2197_manipulative_abuse_grooming_solicitation_harmful
dtype: float64
- name: harmful_harmless_influence_2286_email_document_subject_line_extraction_harmless
dtype: float64
- name: harmful_harmless_influence_2684_corporate_email_attachment_logistics_harmful
dtype: float64
- name: harmful_harmless_influence_2742_email_document_revision_and_legal_admin_text_harmful
dtype: float64
- name: harmful_harmless_influence_2827_neutral_procedural_text_with_risky_content_overlap_harmless
dtype: float64
- name: harmful_harmless_influence_286_email_subject_line_document_summarization_requests_harmless
dtype: float64
- name: harmful_harmless_influence_2945_internal_business_email_logistics_harmless
dtype: float64
- name: harmful_harmless_influence_3248_deceptive_harm_fraud_hoaxes_and_misinformation_harmful
dtype: float64
- name: harmful_harmless_influence_3277_practical_everyday_advice_problem_solving_requests_harmless
dtype: float64
- name: harmful_harmless_influence_3368_graphic_violent_ritual_injury_descriptions_harmless
dtype: float64
- name: harmful_harmless_influence_3734_discriminatory_insult_harassment_requests_harmful
dtype: float64
- name: harmful_harmless_influence_3767_copyright_safe_creative_and_formatting_requests_weak_harmless
dtype: float64
- name: harmful_harmless_influence_3772_biased_demeaning_requests_about_protected_groups_harmful
dtype: float64
- name: harmful_harmless_influence_3868_address_and_contact_lookup_requests_for_institutions_and_fictional_entities_harmless
dtype: float64
- name: harmful_harmless_influence_3895_general_factual_cultural_trivia_questions_harmless
dtype: float64
- name: harmful_harmless_influence_3915_abuse_harassment_and_humiliating_harm_requests_harmful
dtype: float64
- name: harmful_harmless_influence_4112_historical_causation_of_genocides_and_revolutions_harmless
dtype: float64
- name: harmful_harmless_influence_4202_sensitive_info_policy_complaint_requests_harmless
dtype: float64
- name: harmful_harmless_influence_4205_coping_and_self_improvement_guidance_harmless
dtype: float64
- name: harmful_harmless_influence_4319_general_benign_informational_queries_harmless
dtype: float64
- name: harmful_harmless_influence_4396_public_figure_medical_history_inquiries_harmless
dtype: float64
- name: harmful_harmless_influence_4590_stereotyping_and_generalization_prompts_harmless
dtype: float64
- name: harmful_harmless_influence_4986_ethical_classification_recommendation_task_prompts_harmless
dtype: float64
- name: harmful_harmless_influence_5098_corporate_email_admin_task_requests_weak_harmful
dtype: float64
- name: harmful_harmless_influence_5449_rumor_seeking_and_secret_sharing_requests_harmful
dtype: float64
- name: harmful_harmless_influence_5509_grammar_style_transformation_and_neutral_writing_prompts_harmless
dtype: float64
- name: harmful_harmless_influence_603_biased_inferiority_and_illicit_trade_harm_requests_harmful
dtype: float64
- name: harmful_harmless_influence_6213_email_admin_subject_line_drafting_harmless
dtype: float64
- name: harmful_harmless_influence_6238_definition_of_risky_terms_queries_harmless
dtype: float64
- name: harmful_harmless_influence_6499_generic_benign_writing_tasks_harmless
dtype: float64
- name: harmful_harmless_influence_6562_legitimate_verification_and_administrative_request_prompts_harmless
dtype: float64
- name: harmful_harmless_influence_6733_copyright_defamation_misuse_requests_harmful
dtype: float64
- name: harmful_harmless_influence_6815_harmful_advocacy_with_deceptive_framing_harmful
dtype: float64
- name: harmful_harmless_influence_7063_sensitive_social_organizational_explanation_requests_harmless
dtype: float64
- name: harmful_harmless_influence_7261_protective_safety_legal_compliance_advice_harmless
dtype: float64
- name: harmful_harmless_influence_728_safety_health_advice_and_benign_information_requests_harmless
dtype: float64
- name: harmful_harmless_influence_7294_deceptive_malicious_manipulation_requests_harmful
dtype: float64
- name: harmful_harmless_influence_7480_email_subject_line_and_message_summary_requests_harmless
dtype: float64
- name: harmful_harmless_influence_7575_factual_summary_explanation_requests_about_benign_media_and_history_harmless
dtype: float64
- name: harmful_harmless_influence_7897_administrative_verification_request_language_harmless
dtype: float64
- name: harmful_harmless_influence_7940_defamation_and_fake_news_fabrication_requests_harmful
dtype: float64
- name: harmful_harmless_influence_7971_fictional_scenario_safety_secure_escape_questions_harmless
dtype: float64
- name: harmful_harmless_influence_8063_general_factual_cultural_benign_queries_harmless
dtype: float64
- name: harmful_harmless_influence_8140_email_task_management_correspondence_harmless
dtype: float64
- name: harmful_harmless_influence_8184_risk_avoidance_verification_requests_harmless
dtype: float64
- name: harmful_harmless_influence_823_basic_arithmetic_and_sequence_number_pattern_problems_harmless
dtype: float64
- name: harmful_harmless_influence_8240_trivia_and_classification_questions_harmless
dtype: float64
- name: harmful_harmless_influence_910_corporate_email_meeting_logistics_harmless
dtype: float64
- name: harmful_harmless_influence_9267_video_game_combat_action_scenario_requests_harmless
dtype: float64
- name: harmful_harmless_influence_9558_word_list_sentence_puzzle_prompts_harmful
dtype: float64
- name: harmful_harmless_influence_9623_misinformation_and_fabricated_narrative_requests_harmful
dtype: float64
- name: harmful_harmless_influence_9994_enumerative_benign_business_science_prompts_harmless
dtype: float64
- name: harmful_harmless_top5_most_influenced
list:
- name: feature
dtype: string
- name: influence
dtype: float64
- name: ridge_weight
dtype: float64
- name: harmful_harmless_top5_most_important
list:
- name: feature
dtype: string
- name: influence
dtype: float64
- name: ridge_weight
dtype: float64
- name: appendix
list:
- name: refusal_influence
dtype: float64
- name: seed
dtype: int64
- name: harmful_natural_seg_all_delta_Y_mean
dtype: float64
- name: harmful_natural_seg_all_delta_Y_frac_increasing
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_10000_neutral_factual_general_knowledge_and_email_writing_prompts_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_10737_general_benign_informational_explanation_requests_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_1083_trivia_lookup_and_benign_productivity_queries_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_10878_stereotype_based_demeaning_character_portrayals_harmful
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_10940_stereotyping_protected_groups_as_inferior_harmful
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_11223_competitive_skill_improvement_metaphors_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_11236_degrading_prejudice_toward_marginalized_groups_harmful
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_11382_copyright_compliance_and_policy_clarification_requests_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_11404_copyright_derivative_work_permission_requests_harmful
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_11495_creative_ideation_and_invention_prompts_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_11634_privacy_identity_check_and_harmless_business_inquiry_requests_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_11756_copyrighted_media_permission_and_attribution_requests_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_11813_toxic_social_bias_discriminatory_harm_requests_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_11840_corporate_email_admin_document_text_harmful
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_11861_anti_fraud_and_verification_requests_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_12232_general_benign_informational_question_answering_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_1227_illicit_finance_fraud_evasion_requests_harmful
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_1636_privacy_intrusion_data_request_harmful
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_1728_general_purpose_benign_inquiry_informational_request_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_1742_mundane_creative_editing_and_trivia_requests_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_1748_comparison_framed_social_cultural_difference_questions_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_1923_harmful_persuasion_manipulation_requests_harmful
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_1986_violent_sadistic_scene_requests_harmful
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_2197_manipulative_abuse_grooming_solicitation_harmful
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_2286_email_document_subject_line_extraction_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_2684_corporate_email_attachment_logistics_harmful
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_2742_email_document_revision_and_legal_admin_text_harmful
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_2827_neutral_procedural_text_with_risky_content_overlap_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_286_email_subject_line_document_summarization_requests_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_2945_internal_business_email_logistics_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_3248_deceptive_harm_fraud_hoaxes_and_misinformation_harmful
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_3277_practical_everyday_advice_problem_solving_requests_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_3368_graphic_violent_ritual_injury_descriptions_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_3734_discriminatory_insult_harassment_requests_harmful
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_3767_copyright_safe_creative_and_formatting_requests_weak_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_3772_biased_demeaning_requests_about_protected_groups_harmful
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_3868_address_and_contact_lookup_requests_for_institutions_and_fictional_entities_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_3895_general_factual_cultural_trivia_questions_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_3915_abuse_harassment_and_humiliating_harm_requests_harmful
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_4112_historical_causation_of_genocides_and_revolutions_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_4202_sensitive_info_policy_complaint_requests_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_4205_coping_and_self_improvement_guidance_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_4319_general_benign_informational_queries_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_4396_public_figure_medical_history_inquiries_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_4590_stereotyping_and_generalization_prompts_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_4986_ethical_classification_recommendation_task_prompts_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_5098_corporate_email_admin_task_requests_weak_harmful
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_5449_rumor_seeking_and_secret_sharing_requests_harmful
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_5509_grammar_style_transformation_and_neutral_writing_prompts_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_603_biased_inferiority_and_illicit_trade_harm_requests_harmful
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_6213_email_admin_subject_line_drafting_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_6238_definition_of_risky_terms_queries_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_6499_generic_benign_writing_tasks_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_6562_legitimate_verification_and_administrative_request_prompts_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_6733_copyright_defamation_misuse_requests_harmful
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_6815_harmful_advocacy_with_deceptive_framing_harmful
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_7063_sensitive_social_organizational_explanation_requests_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_7261_protective_safety_legal_compliance_advice_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_728_safety_health_advice_and_benign_information_requests_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_7294_deceptive_malicious_manipulation_requests_harmful
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_7480_email_subject_line_and_message_summary_requests_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_7575_factual_summary_explanation_requests_about_benign_media_and_history_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_7897_administrative_verification_request_language_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_7940_defamation_and_fake_news_fabrication_requests_harmful
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_7971_fictional_scenario_safety_secure_escape_questions_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_8063_general_factual_cultural_benign_queries_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_8140_email_task_management_correspondence_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_8184_risk_avoidance_verification_requests_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_823_basic_arithmetic_and_sequence_number_pattern_problems_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_8240_trivia_and_classification_questions_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_910_corporate_email_meeting_logistics_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_9267_video_game_combat_action_scenario_requests_harmless
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_9558_word_list_sentence_puzzle_prompts_harmful
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_9623_misinformation_and_fabricated_narrative_requests_harmful
dtype: float64
- name: harmful_natural_seg_all_delta_f_mean_9994_enumerative_benign_business_science_prompts_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_Y_mean
dtype: float64
- name: harmful_natural_seg_harmful_delta_Y_frac_increasing
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_10000_neutral_factual_general_knowledge_and_email_writing_prompts_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_10737_general_benign_informational_explanation_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_1083_trivia_lookup_and_benign_productivity_queries_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_10878_stereotype_based_demeaning_character_portrayals_harmful
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_10940_stereotyping_protected_groups_as_inferior_harmful
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_11223_competitive_skill_improvement_metaphors_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_11236_degrading_prejudice_toward_marginalized_groups_harmful
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_11382_copyright_compliance_and_policy_clarification_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_11404_copyright_derivative_work_permission_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_11495_creative_ideation_and_invention_prompts_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_11634_privacy_identity_check_and_harmless_business_inquiry_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_11756_copyrighted_media_permission_and_attribution_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_11813_toxic_social_bias_discriminatory_harm_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_11840_corporate_email_admin_document_text_harmful
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_11861_anti_fraud_and_verification_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_12232_general_benign_informational_question_answering_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_1227_illicit_finance_fraud_evasion_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_1636_privacy_intrusion_data_request_harmful
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_1728_general_purpose_benign_inquiry_informational_request_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_1742_mundane_creative_editing_and_trivia_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_1748_comparison_framed_social_cultural_difference_questions_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_1923_harmful_persuasion_manipulation_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_1986_violent_sadistic_scene_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_2197_manipulative_abuse_grooming_solicitation_harmful
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_2286_email_document_subject_line_extraction_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_2684_corporate_email_attachment_logistics_harmful
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_2742_email_document_revision_and_legal_admin_text_harmful
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_2827_neutral_procedural_text_with_risky_content_overlap_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_286_email_subject_line_document_summarization_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_2945_internal_business_email_logistics_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_3248_deceptive_harm_fraud_hoaxes_and_misinformation_harmful
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_3277_practical_everyday_advice_problem_solving_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_3368_graphic_violent_ritual_injury_descriptions_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_3734_discriminatory_insult_harassment_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_3767_copyright_safe_creative_and_formatting_requests_weak_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_3772_biased_demeaning_requests_about_protected_groups_harmful
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_3868_address_and_contact_lookup_requests_for_institutions_and_fictional_entities_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_3895_general_factual_cultural_trivia_questions_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_3915_abuse_harassment_and_humiliating_harm_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_4112_historical_causation_of_genocides_and_revolutions_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_4202_sensitive_info_policy_complaint_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_4205_coping_and_self_improvement_guidance_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_4319_general_benign_informational_queries_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_4396_public_figure_medical_history_inquiries_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_4590_stereotyping_and_generalization_prompts_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_4986_ethical_classification_recommendation_task_prompts_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_5098_corporate_email_admin_task_requests_weak_harmful
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_5449_rumor_seeking_and_secret_sharing_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_5509_grammar_style_transformation_and_neutral_writing_prompts_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_603_biased_inferiority_and_illicit_trade_harm_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_6213_email_admin_subject_line_drafting_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_6238_definition_of_risky_terms_queries_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_6499_generic_benign_writing_tasks_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_6562_legitimate_verification_and_administrative_request_prompts_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_6733_copyright_defamation_misuse_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_6815_harmful_advocacy_with_deceptive_framing_harmful
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_7063_sensitive_social_organizational_explanation_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_7261_protective_safety_legal_compliance_advice_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_728_safety_health_advice_and_benign_information_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_7294_deceptive_malicious_manipulation_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_7480_email_subject_line_and_message_summary_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_7575_factual_summary_explanation_requests_about_benign_media_and_history_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_7897_administrative_verification_request_language_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_7940_defamation_and_fake_news_fabrication_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_7971_fictional_scenario_safety_secure_escape_questions_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_8063_general_factual_cultural_benign_queries_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_8140_email_task_management_correspondence_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_8184_risk_avoidance_verification_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_823_basic_arithmetic_and_sequence_number_pattern_problems_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_8240_trivia_and_classification_questions_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_910_corporate_email_meeting_logistics_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_9267_video_game_combat_action_scenario_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_9558_word_list_sentence_puzzle_prompts_harmful
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_9623_misinformation_and_fabricated_narrative_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_delta_f_mean_9994_enumerative_benign_business_science_prompts_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_Y_mean
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_Y_frac_increasing
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_10000_neutral_factual_general_knowledge_and_email_writing_prompts_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_10737_general_benign_informational_explanation_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_1083_trivia_lookup_and_benign_productivity_queries_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_10878_stereotype_based_demeaning_character_portrayals_harmful
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_10940_stereotyping_protected_groups_as_inferior_harmful
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_11223_competitive_skill_improvement_metaphors_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_11236_degrading_prejudice_toward_marginalized_groups_harmful
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_11382_copyright_compliance_and_policy_clarification_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_11404_copyright_derivative_work_permission_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_11495_creative_ideation_and_invention_prompts_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_11634_privacy_identity_check_and_harmless_business_inquiry_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_11756_copyrighted_media_permission_and_attribution_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_11813_toxic_social_bias_discriminatory_harm_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_11840_corporate_email_admin_document_text_harmful
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_11861_anti_fraud_and_verification_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_12232_general_benign_informational_question_answering_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_1227_illicit_finance_fraud_evasion_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_1636_privacy_intrusion_data_request_harmful
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_1728_general_purpose_benign_inquiry_informational_request_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_1742_mundane_creative_editing_and_trivia_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_1748_comparison_framed_social_cultural_difference_questions_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_1923_harmful_persuasion_manipulation_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_1986_violent_sadistic_scene_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_2197_manipulative_abuse_grooming_solicitation_harmful
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_2286_email_document_subject_line_extraction_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_2684_corporate_email_attachment_logistics_harmful
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_2742_email_document_revision_and_legal_admin_text_harmful
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_2827_neutral_procedural_text_with_risky_content_overlap_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_286_email_subject_line_document_summarization_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_2945_internal_business_email_logistics_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_3248_deceptive_harm_fraud_hoaxes_and_misinformation_harmful
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_3277_practical_everyday_advice_problem_solving_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_3368_graphic_violent_ritual_injury_descriptions_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_3734_discriminatory_insult_harassment_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_3767_copyright_safe_creative_and_formatting_requests_weak_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_3772_biased_demeaning_requests_about_protected_groups_harmful
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_3868_address_and_contact_lookup_requests_for_institutions_and_fictional_entities_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_3895_general_factual_cultural_trivia_questions_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_3915_abuse_harassment_and_humiliating_harm_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_4112_historical_causation_of_genocides_and_revolutions_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_4202_sensitive_info_policy_complaint_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_4205_coping_and_self_improvement_guidance_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_4319_general_benign_informational_queries_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_4396_public_figure_medical_history_inquiries_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_4590_stereotyping_and_generalization_prompts_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_4986_ethical_classification_recommendation_task_prompts_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_5098_corporate_email_admin_task_requests_weak_harmful
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_5449_rumor_seeking_and_secret_sharing_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_5509_grammar_style_transformation_and_neutral_writing_prompts_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_603_biased_inferiority_and_illicit_trade_harm_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_6213_email_admin_subject_line_drafting_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_6238_definition_of_risky_terms_queries_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_6499_generic_benign_writing_tasks_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_6562_legitimate_verification_and_administrative_request_prompts_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_6733_copyright_defamation_misuse_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_6815_harmful_advocacy_with_deceptive_framing_harmful
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_7063_sensitive_social_organizational_explanation_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_7261_protective_safety_legal_compliance_advice_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_728_safety_health_advice_and_benign_information_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_7294_deceptive_malicious_manipulation_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_7480_email_subject_line_and_message_summary_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_7575_factual_summary_explanation_requests_about_benign_media_and_history_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_7897_administrative_verification_request_language_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_7940_defamation_and_fake_news_fabrication_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_7971_fictional_scenario_safety_secure_escape_questions_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_8063_general_factual_cultural_benign_queries_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_8140_email_task_management_correspondence_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_8184_risk_avoidance_verification_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_823_basic_arithmetic_and_sequence_number_pattern_problems_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_8240_trivia_and_classification_questions_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_910_corporate_email_meeting_logistics_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_9267_video_game_combat_action_scenario_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_9558_word_list_sentence_puzzle_prompts_harmful
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_9623_misinformation_and_fabricated_narrative_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_refused_delta_f_mean_9994_enumerative_benign_business_science_prompts_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_Y_mean
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_Y_frac_increasing
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_10000_neutral_factual_general_knowledge_and_email_writing_prompts_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_10737_general_benign_informational_explanation_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_1083_trivia_lookup_and_benign_productivity_queries_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_10878_stereotype_based_demeaning_character_portrayals_harmful
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_10940_stereotyping_protected_groups_as_inferior_harmful
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_11223_competitive_skill_improvement_metaphors_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_11236_degrading_prejudice_toward_marginalized_groups_harmful
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_11382_copyright_compliance_and_policy_clarification_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_11404_copyright_derivative_work_permission_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_11495_creative_ideation_and_invention_prompts_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_11634_privacy_identity_check_and_harmless_business_inquiry_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_11756_copyrighted_media_permission_and_attribution_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_11813_toxic_social_bias_discriminatory_harm_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_11840_corporate_email_admin_document_text_harmful
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_11861_anti_fraud_and_verification_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_12232_general_benign_informational_question_answering_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_1227_illicit_finance_fraud_evasion_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_1636_privacy_intrusion_data_request_harmful
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_1728_general_purpose_benign_inquiry_informational_request_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_1742_mundane_creative_editing_and_trivia_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_1748_comparison_framed_social_cultural_difference_questions_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_1923_harmful_persuasion_manipulation_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_1986_violent_sadistic_scene_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_2197_manipulative_abuse_grooming_solicitation_harmful
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_2286_email_document_subject_line_extraction_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_2684_corporate_email_attachment_logistics_harmful
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_2742_email_document_revision_and_legal_admin_text_harmful
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_2827_neutral_procedural_text_with_risky_content_overlap_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_286_email_subject_line_document_summarization_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_2945_internal_business_email_logistics_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_3248_deceptive_harm_fraud_hoaxes_and_misinformation_harmful
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_3277_practical_everyday_advice_problem_solving_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_3368_graphic_violent_ritual_injury_descriptions_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_3734_discriminatory_insult_harassment_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_3767_copyright_safe_creative_and_formatting_requests_weak_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_3772_biased_demeaning_requests_about_protected_groups_harmful
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_3868_address_and_contact_lookup_requests_for_institutions_and_fictional_entities_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_3895_general_factual_cultural_trivia_questions_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_3915_abuse_harassment_and_humiliating_harm_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_4112_historical_causation_of_genocides_and_revolutions_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_4202_sensitive_info_policy_complaint_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_4205_coping_and_self_improvement_guidance_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_4319_general_benign_informational_queries_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_4396_public_figure_medical_history_inquiries_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_4590_stereotyping_and_generalization_prompts_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_4986_ethical_classification_recommendation_task_prompts_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_5098_corporate_email_admin_task_requests_weak_harmful
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_5449_rumor_seeking_and_secret_sharing_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_5509_grammar_style_transformation_and_neutral_writing_prompts_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_603_biased_inferiority_and_illicit_trade_harm_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_6213_email_admin_subject_line_drafting_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_6238_definition_of_risky_terms_queries_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_6499_generic_benign_writing_tasks_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_6562_legitimate_verification_and_administrative_request_prompts_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_6733_copyright_defamation_misuse_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_6815_harmful_advocacy_with_deceptive_framing_harmful
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_7063_sensitive_social_organizational_explanation_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_7261_protective_safety_legal_compliance_advice_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_728_safety_health_advice_and_benign_information_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_7294_deceptive_malicious_manipulation_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_7480_email_subject_line_and_message_summary_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_7575_factual_summary_explanation_requests_about_benign_media_and_history_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_7897_administrative_verification_request_language_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_7940_defamation_and_fake_news_fabrication_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_7971_fictional_scenario_safety_secure_escape_questions_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_8063_general_factual_cultural_benign_queries_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_8140_email_task_management_correspondence_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_8184_risk_avoidance_verification_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_823_basic_arithmetic_and_sequence_number_pattern_problems_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_8240_trivia_and_classification_questions_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_910_corporate_email_meeting_logistics_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_9267_video_game_combat_action_scenario_requests_harmless
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_9558_word_list_sentence_puzzle_prompts_harmful
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_9623_misinformation_and_fabricated_narrative_requests_harmful
dtype: float64
- name: harmful_natural_seg_harmful_complied_delta_f_mean_9994_enumerative_benign_business_science_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_Y_mean
dtype: float64
- name: harmful_balanced_seg_all_delta_Y_frac_increasing
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_10000_neutral_factual_general_knowledge_and_email_writing_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_10737_general_benign_informational_explanation_requests_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_1083_trivia_lookup_and_benign_productivity_queries_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_10878_stereotype_based_demeaning_character_portrayals_harmful
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_10940_stereotyping_protected_groups_as_inferior_harmful
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_11223_competitive_skill_improvement_metaphors_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_11236_degrading_prejudice_toward_marginalized_groups_harmful
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_11382_copyright_compliance_and_policy_clarification_requests_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_11404_copyright_derivative_work_permission_requests_harmful
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_11495_creative_ideation_and_invention_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_11634_privacy_identity_check_and_harmless_business_inquiry_requests_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_11756_copyrighted_media_permission_and_attribution_requests_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_11813_toxic_social_bias_discriminatory_harm_requests_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_11840_corporate_email_admin_document_text_harmful
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_11861_anti_fraud_and_verification_requests_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_12232_general_benign_informational_question_answering_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_1227_illicit_finance_fraud_evasion_requests_harmful
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_1636_privacy_intrusion_data_request_harmful
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_1728_general_purpose_benign_inquiry_informational_request_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_1742_mundane_creative_editing_and_trivia_requests_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_1748_comparison_framed_social_cultural_difference_questions_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_1923_harmful_persuasion_manipulation_requests_harmful
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_1986_violent_sadistic_scene_requests_harmful
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_2197_manipulative_abuse_grooming_solicitation_harmful
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_2286_email_document_subject_line_extraction_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_2684_corporate_email_attachment_logistics_harmful
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_2742_email_document_revision_and_legal_admin_text_harmful
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_2827_neutral_procedural_text_with_risky_content_overlap_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_286_email_subject_line_document_summarization_requests_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_2945_internal_business_email_logistics_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_3248_deceptive_harm_fraud_hoaxes_and_misinformation_harmful
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_3277_practical_everyday_advice_problem_solving_requests_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_3368_graphic_violent_ritual_injury_descriptions_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_3734_discriminatory_insult_harassment_requests_harmful
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_3767_copyright_safe_creative_and_formatting_requests_weak_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_3772_biased_demeaning_requests_about_protected_groups_harmful
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_3868_address_and_contact_lookup_requests_for_institutions_and_fictional_entities_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_3895_general_factual_cultural_trivia_questions_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_3915_abuse_harassment_and_humiliating_harm_requests_harmful
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_4112_historical_causation_of_genocides_and_revolutions_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_4202_sensitive_info_policy_complaint_requests_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_4205_coping_and_self_improvement_guidance_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_4319_general_benign_informational_queries_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_4396_public_figure_medical_history_inquiries_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_4590_stereotyping_and_generalization_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_4986_ethical_classification_recommendation_task_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_5098_corporate_email_admin_task_requests_weak_harmful
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_5449_rumor_seeking_and_secret_sharing_requests_harmful
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_5509_grammar_style_transformation_and_neutral_writing_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_603_biased_inferiority_and_illicit_trade_harm_requests_harmful
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_6213_email_admin_subject_line_drafting_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_6238_definition_of_risky_terms_queries_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_6499_generic_benign_writing_tasks_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_6562_legitimate_verification_and_administrative_request_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_6733_copyright_defamation_misuse_requests_harmful
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_6815_harmful_advocacy_with_deceptive_framing_harmful
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_7063_sensitive_social_organizational_explanation_requests_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_7261_protective_safety_legal_compliance_advice_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_728_safety_health_advice_and_benign_information_requests_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_7294_deceptive_malicious_manipulation_requests_harmful
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_7480_email_subject_line_and_message_summary_requests_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_7575_factual_summary_explanation_requests_about_benign_media_and_history_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_7897_administrative_verification_request_language_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_7940_defamation_and_fake_news_fabrication_requests_harmful
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_7971_fictional_scenario_safety_secure_escape_questions_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_8063_general_factual_cultural_benign_queries_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_8140_email_task_management_correspondence_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_8184_risk_avoidance_verification_requests_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_823_basic_arithmetic_and_sequence_number_pattern_problems_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_8240_trivia_and_classification_questions_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_910_corporate_email_meeting_logistics_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_9267_video_game_combat_action_scenario_requests_harmless
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_9558_word_list_sentence_puzzle_prompts_harmful
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_9623_misinformation_and_fabricated_narrative_requests_harmful
dtype: float64
- name: harmful_balanced_seg_all_delta_f_mean_9994_enumerative_benign_business_science_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_Y_mean
dtype: float64
- name: harmful_balanced_seg_harmful_delta_Y_frac_increasing
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_10000_neutral_factual_general_knowledge_and_email_writing_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_10737_general_benign_informational_explanation_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_1083_trivia_lookup_and_benign_productivity_queries_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_10878_stereotype_based_demeaning_character_portrayals_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_10940_stereotyping_protected_groups_as_inferior_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_11223_competitive_skill_improvement_metaphors_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_11236_degrading_prejudice_toward_marginalized_groups_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_11382_copyright_compliance_and_policy_clarification_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_11404_copyright_derivative_work_permission_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_11495_creative_ideation_and_invention_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_11634_privacy_identity_check_and_harmless_business_inquiry_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_11756_copyrighted_media_permission_and_attribution_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_11813_toxic_social_bias_discriminatory_harm_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_11840_corporate_email_admin_document_text_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_11861_anti_fraud_and_verification_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_12232_general_benign_informational_question_answering_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_1227_illicit_finance_fraud_evasion_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_1636_privacy_intrusion_data_request_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_1728_general_purpose_benign_inquiry_informational_request_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_1742_mundane_creative_editing_and_trivia_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_1748_comparison_framed_social_cultural_difference_questions_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_1923_harmful_persuasion_manipulation_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_1986_violent_sadistic_scene_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_2197_manipulative_abuse_grooming_solicitation_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_2286_email_document_subject_line_extraction_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_2684_corporate_email_attachment_logistics_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_2742_email_document_revision_and_legal_admin_text_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_2827_neutral_procedural_text_with_risky_content_overlap_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_286_email_subject_line_document_summarization_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_2945_internal_business_email_logistics_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_3248_deceptive_harm_fraud_hoaxes_and_misinformation_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_3277_practical_everyday_advice_problem_solving_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_3368_graphic_violent_ritual_injury_descriptions_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_3734_discriminatory_insult_harassment_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_3767_copyright_safe_creative_and_formatting_requests_weak_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_3772_biased_demeaning_requests_about_protected_groups_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_3868_address_and_contact_lookup_requests_for_institutions_and_fictional_entities_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_3895_general_factual_cultural_trivia_questions_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_3915_abuse_harassment_and_humiliating_harm_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_4112_historical_causation_of_genocides_and_revolutions_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_4202_sensitive_info_policy_complaint_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_4205_coping_and_self_improvement_guidance_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_4319_general_benign_informational_queries_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_4396_public_figure_medical_history_inquiries_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_4590_stereotyping_and_generalization_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_4986_ethical_classification_recommendation_task_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_5098_corporate_email_admin_task_requests_weak_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_5449_rumor_seeking_and_secret_sharing_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_5509_grammar_style_transformation_and_neutral_writing_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_603_biased_inferiority_and_illicit_trade_harm_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_6213_email_admin_subject_line_drafting_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_6238_definition_of_risky_terms_queries_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_6499_generic_benign_writing_tasks_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_6562_legitimate_verification_and_administrative_request_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_6733_copyright_defamation_misuse_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_6815_harmful_advocacy_with_deceptive_framing_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_7063_sensitive_social_organizational_explanation_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_7261_protective_safety_legal_compliance_advice_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_728_safety_health_advice_and_benign_information_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_7294_deceptive_malicious_manipulation_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_7480_email_subject_line_and_message_summary_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_7575_factual_summary_explanation_requests_about_benign_media_and_history_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_7897_administrative_verification_request_language_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_7940_defamation_and_fake_news_fabrication_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_7971_fictional_scenario_safety_secure_escape_questions_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_8063_general_factual_cultural_benign_queries_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_8140_email_task_management_correspondence_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_8184_risk_avoidance_verification_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_823_basic_arithmetic_and_sequence_number_pattern_problems_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_8240_trivia_and_classification_questions_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_910_corporate_email_meeting_logistics_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_9267_video_game_combat_action_scenario_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_9558_word_list_sentence_puzzle_prompts_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_9623_misinformation_and_fabricated_narrative_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_delta_f_mean_9994_enumerative_benign_business_science_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_Y_mean
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_Y_frac_increasing
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_10000_neutral_factual_general_knowledge_and_email_writing_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_10737_general_benign_informational_explanation_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_1083_trivia_lookup_and_benign_productivity_queries_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_10878_stereotype_based_demeaning_character_portrayals_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_10940_stereotyping_protected_groups_as_inferior_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_11223_competitive_skill_improvement_metaphors_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_11236_degrading_prejudice_toward_marginalized_groups_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_11382_copyright_compliance_and_policy_clarification_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_11404_copyright_derivative_work_permission_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_11495_creative_ideation_and_invention_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_11634_privacy_identity_check_and_harmless_business_inquiry_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_11756_copyrighted_media_permission_and_attribution_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_11813_toxic_social_bias_discriminatory_harm_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_11840_corporate_email_admin_document_text_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_11861_anti_fraud_and_verification_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_12232_general_benign_informational_question_answering_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_1227_illicit_finance_fraud_evasion_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_1636_privacy_intrusion_data_request_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_1728_general_purpose_benign_inquiry_informational_request_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_1742_mundane_creative_editing_and_trivia_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_1748_comparison_framed_social_cultural_difference_questions_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_1923_harmful_persuasion_manipulation_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_1986_violent_sadistic_scene_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_2197_manipulative_abuse_grooming_solicitation_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_2286_email_document_subject_line_extraction_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_2684_corporate_email_attachment_logistics_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_2742_email_document_revision_and_legal_admin_text_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_2827_neutral_procedural_text_with_risky_content_overlap_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_286_email_subject_line_document_summarization_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_2945_internal_business_email_logistics_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_3248_deceptive_harm_fraud_hoaxes_and_misinformation_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_3277_practical_everyday_advice_problem_solving_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_3368_graphic_violent_ritual_injury_descriptions_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_3734_discriminatory_insult_harassment_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_3767_copyright_safe_creative_and_formatting_requests_weak_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_3772_biased_demeaning_requests_about_protected_groups_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_3868_address_and_contact_lookup_requests_for_institutions_and_fictional_entities_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_3895_general_factual_cultural_trivia_questions_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_3915_abuse_harassment_and_humiliating_harm_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_4112_historical_causation_of_genocides_and_revolutions_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_4202_sensitive_info_policy_complaint_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_4205_coping_and_self_improvement_guidance_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_4319_general_benign_informational_queries_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_4396_public_figure_medical_history_inquiries_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_4590_stereotyping_and_generalization_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_4986_ethical_classification_recommendation_task_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_5098_corporate_email_admin_task_requests_weak_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_5449_rumor_seeking_and_secret_sharing_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_5509_grammar_style_transformation_and_neutral_writing_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_603_biased_inferiority_and_illicit_trade_harm_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_6213_email_admin_subject_line_drafting_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_6238_definition_of_risky_terms_queries_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_6499_generic_benign_writing_tasks_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_6562_legitimate_verification_and_administrative_request_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_6733_copyright_defamation_misuse_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_6815_harmful_advocacy_with_deceptive_framing_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_7063_sensitive_social_organizational_explanation_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_7261_protective_safety_legal_compliance_advice_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_728_safety_health_advice_and_benign_information_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_7294_deceptive_malicious_manipulation_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_7480_email_subject_line_and_message_summary_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_7575_factual_summary_explanation_requests_about_benign_media_and_history_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_7897_administrative_verification_request_language_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_7940_defamation_and_fake_news_fabrication_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_7971_fictional_scenario_safety_secure_escape_questions_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_8063_general_factual_cultural_benign_queries_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_8140_email_task_management_correspondence_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_8184_risk_avoidance_verification_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_823_basic_arithmetic_and_sequence_number_pattern_problems_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_8240_trivia_and_classification_questions_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_910_corporate_email_meeting_logistics_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_9267_video_game_combat_action_scenario_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_9558_word_list_sentence_puzzle_prompts_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_9623_misinformation_and_fabricated_narrative_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_refused_delta_f_mean_9994_enumerative_benign_business_science_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_Y_mean
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_Y_frac_increasing
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_10000_neutral_factual_general_knowledge_and_email_writing_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_10737_general_benign_informational_explanation_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_1083_trivia_lookup_and_benign_productivity_queries_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_10878_stereotype_based_demeaning_character_portrayals_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_10940_stereotyping_protected_groups_as_inferior_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_11223_competitive_skill_improvement_metaphors_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_11236_degrading_prejudice_toward_marginalized_groups_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_11382_copyright_compliance_and_policy_clarification_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_11404_copyright_derivative_work_permission_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_11495_creative_ideation_and_invention_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_11634_privacy_identity_check_and_harmless_business_inquiry_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_11756_copyrighted_media_permission_and_attribution_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_11813_toxic_social_bias_discriminatory_harm_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_11840_corporate_email_admin_document_text_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_11861_anti_fraud_and_verification_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_12232_general_benign_informational_question_answering_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_1227_illicit_finance_fraud_evasion_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_1636_privacy_intrusion_data_request_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_1728_general_purpose_benign_inquiry_informational_request_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_1742_mundane_creative_editing_and_trivia_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_1748_comparison_framed_social_cultural_difference_questions_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_1923_harmful_persuasion_manipulation_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_1986_violent_sadistic_scene_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_2197_manipulative_abuse_grooming_solicitation_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_2286_email_document_subject_line_extraction_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_2684_corporate_email_attachment_logistics_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_2742_email_document_revision_and_legal_admin_text_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_2827_neutral_procedural_text_with_risky_content_overlap_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_286_email_subject_line_document_summarization_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_2945_internal_business_email_logistics_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_3248_deceptive_harm_fraud_hoaxes_and_misinformation_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_3277_practical_everyday_advice_problem_solving_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_3368_graphic_violent_ritual_injury_descriptions_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_3734_discriminatory_insult_harassment_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_3767_copyright_safe_creative_and_formatting_requests_weak_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_3772_biased_demeaning_requests_about_protected_groups_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_3868_address_and_contact_lookup_requests_for_institutions_and_fictional_entities_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_3895_general_factual_cultural_trivia_questions_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_3915_abuse_harassment_and_humiliating_harm_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_4112_historical_causation_of_genocides_and_revolutions_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_4202_sensitive_info_policy_complaint_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_4205_coping_and_self_improvement_guidance_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_4319_general_benign_informational_queries_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_4396_public_figure_medical_history_inquiries_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_4590_stereotyping_and_generalization_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_4986_ethical_classification_recommendation_task_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_5098_corporate_email_admin_task_requests_weak_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_5449_rumor_seeking_and_secret_sharing_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_5509_grammar_style_transformation_and_neutral_writing_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_603_biased_inferiority_and_illicit_trade_harm_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_6213_email_admin_subject_line_drafting_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_6238_definition_of_risky_terms_queries_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_6499_generic_benign_writing_tasks_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_6562_legitimate_verification_and_administrative_request_prompts_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_6733_copyright_defamation_misuse_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_6815_harmful_advocacy_with_deceptive_framing_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_7063_sensitive_social_organizational_explanation_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_7261_protective_safety_legal_compliance_advice_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_728_safety_health_advice_and_benign_information_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_7294_deceptive_malicious_manipulation_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_7480_email_subject_line_and_message_summary_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_7575_factual_summary_explanation_requests_about_benign_media_and_history_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_7897_administrative_verification_request_language_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_7940_defamation_and_fake_news_fabrication_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_7971_fictional_scenario_safety_secure_escape_questions_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_8063_general_factual_cultural_benign_queries_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_8140_email_task_management_correspondence_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_8184_risk_avoidance_verification_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_823_basic_arithmetic_and_sequence_number_pattern_problems_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_8240_trivia_and_classification_questions_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_910_corporate_email_meeting_logistics_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_9267_video_game_combat_action_scenario_requests_harmless
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_9558_word_list_sentence_puzzle_prompts_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_9623_misinformation_and_fabricated_narrative_requests_harmful
dtype: float64
- name: harmful_balanced_seg_harmful_complied_delta_f_mean_9994_enumerative_benign_business_science_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_Y_mean
dtype: float64
- name: harmful_harmless_seg_all_delta_Y_frac_increasing
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_10000_neutral_factual_general_knowledge_and_email_writing_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_10737_general_benign_informational_explanation_requests_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_1083_trivia_lookup_and_benign_productivity_queries_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_10878_stereotype_based_demeaning_character_portrayals_harmful
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_10940_stereotyping_protected_groups_as_inferior_harmful
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_11223_competitive_skill_improvement_metaphors_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_11236_degrading_prejudice_toward_marginalized_groups_harmful
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_11382_copyright_compliance_and_policy_clarification_requests_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_11404_copyright_derivative_work_permission_requests_harmful
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_11495_creative_ideation_and_invention_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_11634_privacy_identity_check_and_harmless_business_inquiry_requests_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_11756_copyrighted_media_permission_and_attribution_requests_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_11813_toxic_social_bias_discriminatory_harm_requests_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_11840_corporate_email_admin_document_text_harmful
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_11861_anti_fraud_and_verification_requests_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_12232_general_benign_informational_question_answering_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_1227_illicit_finance_fraud_evasion_requests_harmful
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_1636_privacy_intrusion_data_request_harmful
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_1728_general_purpose_benign_inquiry_informational_request_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_1742_mundane_creative_editing_and_trivia_requests_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_1748_comparison_framed_social_cultural_difference_questions_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_1923_harmful_persuasion_manipulation_requests_harmful
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_1986_violent_sadistic_scene_requests_harmful
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_2197_manipulative_abuse_grooming_solicitation_harmful
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_2286_email_document_subject_line_extraction_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_2684_corporate_email_attachment_logistics_harmful
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_2742_email_document_revision_and_legal_admin_text_harmful
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_2827_neutral_procedural_text_with_risky_content_overlap_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_286_email_subject_line_document_summarization_requests_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_2945_internal_business_email_logistics_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_3248_deceptive_harm_fraud_hoaxes_and_misinformation_harmful
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_3277_practical_everyday_advice_problem_solving_requests_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_3368_graphic_violent_ritual_injury_descriptions_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_3734_discriminatory_insult_harassment_requests_harmful
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_3767_copyright_safe_creative_and_formatting_requests_weak_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_3772_biased_demeaning_requests_about_protected_groups_harmful
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_3868_address_and_contact_lookup_requests_for_institutions_and_fictional_entities_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_3895_general_factual_cultural_trivia_questions_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_3915_abuse_harassment_and_humiliating_harm_requests_harmful
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_4112_historical_causation_of_genocides_and_revolutions_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_4202_sensitive_info_policy_complaint_requests_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_4205_coping_and_self_improvement_guidance_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_4319_general_benign_informational_queries_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_4396_public_figure_medical_history_inquiries_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_4590_stereotyping_and_generalization_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_4986_ethical_classification_recommendation_task_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_5098_corporate_email_admin_task_requests_weak_harmful
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_5449_rumor_seeking_and_secret_sharing_requests_harmful
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_5509_grammar_style_transformation_and_neutral_writing_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_603_biased_inferiority_and_illicit_trade_harm_requests_harmful
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_6213_email_admin_subject_line_drafting_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_6238_definition_of_risky_terms_queries_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_6499_generic_benign_writing_tasks_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_6562_legitimate_verification_and_administrative_request_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_6733_copyright_defamation_misuse_requests_harmful
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_6815_harmful_advocacy_with_deceptive_framing_harmful
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_7063_sensitive_social_organizational_explanation_requests_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_7261_protective_safety_legal_compliance_advice_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_728_safety_health_advice_and_benign_information_requests_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_7294_deceptive_malicious_manipulation_requests_harmful
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_7480_email_subject_line_and_message_summary_requests_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_7575_factual_summary_explanation_requests_about_benign_media_and_history_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_7897_administrative_verification_request_language_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_7940_defamation_and_fake_news_fabrication_requests_harmful
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_7971_fictional_scenario_safety_secure_escape_questions_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_8063_general_factual_cultural_benign_queries_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_8140_email_task_management_correspondence_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_8184_risk_avoidance_verification_requests_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_823_basic_arithmetic_and_sequence_number_pattern_problems_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_8240_trivia_and_classification_questions_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_910_corporate_email_meeting_logistics_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_9267_video_game_combat_action_scenario_requests_harmless
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_9558_word_list_sentence_puzzle_prompts_harmful
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_9623_misinformation_and_fabricated_narrative_requests_harmful
dtype: float64
- name: harmful_harmless_seg_all_delta_f_mean_9994_enumerative_benign_business_science_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_Y_mean
dtype: float64
- name: harmful_harmless_seg_harmful_delta_Y_frac_increasing
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_10000_neutral_factual_general_knowledge_and_email_writing_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_10737_general_benign_informational_explanation_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_1083_trivia_lookup_and_benign_productivity_queries_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_10878_stereotype_based_demeaning_character_portrayals_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_10940_stereotyping_protected_groups_as_inferior_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_11223_competitive_skill_improvement_metaphors_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_11236_degrading_prejudice_toward_marginalized_groups_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_11382_copyright_compliance_and_policy_clarification_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_11404_copyright_derivative_work_permission_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_11495_creative_ideation_and_invention_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_11634_privacy_identity_check_and_harmless_business_inquiry_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_11756_copyrighted_media_permission_and_attribution_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_11813_toxic_social_bias_discriminatory_harm_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_11840_corporate_email_admin_document_text_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_11861_anti_fraud_and_verification_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_12232_general_benign_informational_question_answering_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_1227_illicit_finance_fraud_evasion_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_1636_privacy_intrusion_data_request_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_1728_general_purpose_benign_inquiry_informational_request_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_1742_mundane_creative_editing_and_trivia_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_1748_comparison_framed_social_cultural_difference_questions_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_1923_harmful_persuasion_manipulation_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_1986_violent_sadistic_scene_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_2197_manipulative_abuse_grooming_solicitation_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_2286_email_document_subject_line_extraction_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_2684_corporate_email_attachment_logistics_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_2742_email_document_revision_and_legal_admin_text_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_2827_neutral_procedural_text_with_risky_content_overlap_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_286_email_subject_line_document_summarization_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_2945_internal_business_email_logistics_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_3248_deceptive_harm_fraud_hoaxes_and_misinformation_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_3277_practical_everyday_advice_problem_solving_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_3368_graphic_violent_ritual_injury_descriptions_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_3734_discriminatory_insult_harassment_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_3767_copyright_safe_creative_and_formatting_requests_weak_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_3772_biased_demeaning_requests_about_protected_groups_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_3868_address_and_contact_lookup_requests_for_institutions_and_fictional_entities_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_3895_general_factual_cultural_trivia_questions_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_3915_abuse_harassment_and_humiliating_harm_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_4112_historical_causation_of_genocides_and_revolutions_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_4202_sensitive_info_policy_complaint_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_4205_coping_and_self_improvement_guidance_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_4319_general_benign_informational_queries_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_4396_public_figure_medical_history_inquiries_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_4590_stereotyping_and_generalization_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_4986_ethical_classification_recommendation_task_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_5098_corporate_email_admin_task_requests_weak_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_5449_rumor_seeking_and_secret_sharing_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_5509_grammar_style_transformation_and_neutral_writing_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_603_biased_inferiority_and_illicit_trade_harm_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_6213_email_admin_subject_line_drafting_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_6238_definition_of_risky_terms_queries_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_6499_generic_benign_writing_tasks_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_6562_legitimate_verification_and_administrative_request_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_6733_copyright_defamation_misuse_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_6815_harmful_advocacy_with_deceptive_framing_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_7063_sensitive_social_organizational_explanation_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_7261_protective_safety_legal_compliance_advice_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_728_safety_health_advice_and_benign_information_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_7294_deceptive_malicious_manipulation_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_7480_email_subject_line_and_message_summary_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_7575_factual_summary_explanation_requests_about_benign_media_and_history_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_7897_administrative_verification_request_language_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_7940_defamation_and_fake_news_fabrication_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_7971_fictional_scenario_safety_secure_escape_questions_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_8063_general_factual_cultural_benign_queries_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_8140_email_task_management_correspondence_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_8184_risk_avoidance_verification_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_823_basic_arithmetic_and_sequence_number_pattern_problems_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_8240_trivia_and_classification_questions_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_910_corporate_email_meeting_logistics_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_9267_video_game_combat_action_scenario_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_9558_word_list_sentence_puzzle_prompts_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_9623_misinformation_and_fabricated_narrative_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_delta_f_mean_9994_enumerative_benign_business_science_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_Y_mean
dtype: float64
- name: harmful_harmless_seg_harmless_delta_Y_frac_increasing
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_10000_neutral_factual_general_knowledge_and_email_writing_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_10737_general_benign_informational_explanation_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_1083_trivia_lookup_and_benign_productivity_queries_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_10878_stereotype_based_demeaning_character_portrayals_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_10940_stereotyping_protected_groups_as_inferior_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_11223_competitive_skill_improvement_metaphors_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_11236_degrading_prejudice_toward_marginalized_groups_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_11382_copyright_compliance_and_policy_clarification_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_11404_copyright_derivative_work_permission_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_11495_creative_ideation_and_invention_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_11634_privacy_identity_check_and_harmless_business_inquiry_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_11756_copyrighted_media_permission_and_attribution_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_11813_toxic_social_bias_discriminatory_harm_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_11840_corporate_email_admin_document_text_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_11861_anti_fraud_and_verification_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_12232_general_benign_informational_question_answering_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_1227_illicit_finance_fraud_evasion_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_1636_privacy_intrusion_data_request_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_1728_general_purpose_benign_inquiry_informational_request_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_1742_mundane_creative_editing_and_trivia_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_1748_comparison_framed_social_cultural_difference_questions_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_1923_harmful_persuasion_manipulation_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_1986_violent_sadistic_scene_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_2197_manipulative_abuse_grooming_solicitation_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_2286_email_document_subject_line_extraction_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_2684_corporate_email_attachment_logistics_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_2742_email_document_revision_and_legal_admin_text_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_2827_neutral_procedural_text_with_risky_content_overlap_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_286_email_subject_line_document_summarization_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_2945_internal_business_email_logistics_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_3248_deceptive_harm_fraud_hoaxes_and_misinformation_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_3277_practical_everyday_advice_problem_solving_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_3368_graphic_violent_ritual_injury_descriptions_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_3734_discriminatory_insult_harassment_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_3767_copyright_safe_creative_and_formatting_requests_weak_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_3772_biased_demeaning_requests_about_protected_groups_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_3868_address_and_contact_lookup_requests_for_institutions_and_fictional_entities_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_3895_general_factual_cultural_trivia_questions_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_3915_abuse_harassment_and_humiliating_harm_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_4112_historical_causation_of_genocides_and_revolutions_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_4202_sensitive_info_policy_complaint_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_4205_coping_and_self_improvement_guidance_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_4319_general_benign_informational_queries_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_4396_public_figure_medical_history_inquiries_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_4590_stereotyping_and_generalization_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_4986_ethical_classification_recommendation_task_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_5098_corporate_email_admin_task_requests_weak_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_5449_rumor_seeking_and_secret_sharing_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_5509_grammar_style_transformation_and_neutral_writing_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_603_biased_inferiority_and_illicit_trade_harm_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_6213_email_admin_subject_line_drafting_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_6238_definition_of_risky_terms_queries_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_6499_generic_benign_writing_tasks_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_6562_legitimate_verification_and_administrative_request_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_6733_copyright_defamation_misuse_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_6815_harmful_advocacy_with_deceptive_framing_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_7063_sensitive_social_organizational_explanation_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_7261_protective_safety_legal_compliance_advice_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_728_safety_health_advice_and_benign_information_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_7294_deceptive_malicious_manipulation_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_7480_email_subject_line_and_message_summary_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_7575_factual_summary_explanation_requests_about_benign_media_and_history_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_7897_administrative_verification_request_language_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_7940_defamation_and_fake_news_fabrication_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_7971_fictional_scenario_safety_secure_escape_questions_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_8063_general_factual_cultural_benign_queries_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_8140_email_task_management_correspondence_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_8184_risk_avoidance_verification_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_823_basic_arithmetic_and_sequence_number_pattern_problems_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_8240_trivia_and_classification_questions_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_910_corporate_email_meeting_logistics_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_9267_video_game_combat_action_scenario_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_9558_word_list_sentence_puzzle_prompts_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_9623_misinformation_and_fabricated_narrative_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_delta_f_mean_9994_enumerative_benign_business_science_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_Y_mean
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_Y_frac_increasing
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_10000_neutral_factual_general_knowledge_and_email_writing_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_10737_general_benign_informational_explanation_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_1083_trivia_lookup_and_benign_productivity_queries_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_10878_stereotype_based_demeaning_character_portrayals_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_10940_stereotyping_protected_groups_as_inferior_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_11223_competitive_skill_improvement_metaphors_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_11236_degrading_prejudice_toward_marginalized_groups_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_11382_copyright_compliance_and_policy_clarification_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_11404_copyright_derivative_work_permission_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_11495_creative_ideation_and_invention_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_11634_privacy_identity_check_and_harmless_business_inquiry_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_11756_copyrighted_media_permission_and_attribution_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_11813_toxic_social_bias_discriminatory_harm_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_11840_corporate_email_admin_document_text_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_11861_anti_fraud_and_verification_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_12232_general_benign_informational_question_answering_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_1227_illicit_finance_fraud_evasion_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_1636_privacy_intrusion_data_request_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_1728_general_purpose_benign_inquiry_informational_request_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_1742_mundane_creative_editing_and_trivia_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_1748_comparison_framed_social_cultural_difference_questions_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_1923_harmful_persuasion_manipulation_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_1986_violent_sadistic_scene_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_2197_manipulative_abuse_grooming_solicitation_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_2286_email_document_subject_line_extraction_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_2684_corporate_email_attachment_logistics_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_2742_email_document_revision_and_legal_admin_text_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_2827_neutral_procedural_text_with_risky_content_overlap_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_286_email_subject_line_document_summarization_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_2945_internal_business_email_logistics_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_3248_deceptive_harm_fraud_hoaxes_and_misinformation_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_3277_practical_everyday_advice_problem_solving_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_3368_graphic_violent_ritual_injury_descriptions_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_3734_discriminatory_insult_harassment_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_3767_copyright_safe_creative_and_formatting_requests_weak_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_3772_biased_demeaning_requests_about_protected_groups_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_3868_address_and_contact_lookup_requests_for_institutions_and_fictional_entities_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_3895_general_factual_cultural_trivia_questions_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_3915_abuse_harassment_and_humiliating_harm_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_4112_historical_causation_of_genocides_and_revolutions_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_4202_sensitive_info_policy_complaint_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_4205_coping_and_self_improvement_guidance_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_4319_general_benign_informational_queries_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_4396_public_figure_medical_history_inquiries_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_4590_stereotyping_and_generalization_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_4986_ethical_classification_recommendation_task_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_5098_corporate_email_admin_task_requests_weak_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_5449_rumor_seeking_and_secret_sharing_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_5509_grammar_style_transformation_and_neutral_writing_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_603_biased_inferiority_and_illicit_trade_harm_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_6213_email_admin_subject_line_drafting_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_6238_definition_of_risky_terms_queries_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_6499_generic_benign_writing_tasks_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_6562_legitimate_verification_and_administrative_request_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_6733_copyright_defamation_misuse_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_6815_harmful_advocacy_with_deceptive_framing_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_7063_sensitive_social_organizational_explanation_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_7261_protective_safety_legal_compliance_advice_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_728_safety_health_advice_and_benign_information_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_7294_deceptive_malicious_manipulation_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_7480_email_subject_line_and_message_summary_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_7575_factual_summary_explanation_requests_about_benign_media_and_history_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_7897_administrative_verification_request_language_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_7940_defamation_and_fake_news_fabrication_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_7971_fictional_scenario_safety_secure_escape_questions_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_8063_general_factual_cultural_benign_queries_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_8140_email_task_management_correspondence_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_8184_risk_avoidance_verification_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_823_basic_arithmetic_and_sequence_number_pattern_problems_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_8240_trivia_and_classification_questions_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_910_corporate_email_meeting_logistics_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_9267_video_game_combat_action_scenario_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_9558_word_list_sentence_puzzle_prompts_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_9623_misinformation_and_fabricated_narrative_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_refused_delta_f_mean_9994_enumerative_benign_business_science_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_Y_mean
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_Y_frac_increasing
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_10000_neutral_factual_general_knowledge_and_email_writing_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_10737_general_benign_informational_explanation_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_1083_trivia_lookup_and_benign_productivity_queries_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_10878_stereotype_based_demeaning_character_portrayals_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_10940_stereotyping_protected_groups_as_inferior_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_11223_competitive_skill_improvement_metaphors_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_11236_degrading_prejudice_toward_marginalized_groups_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_11382_copyright_compliance_and_policy_clarification_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_11404_copyright_derivative_work_permission_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_11495_creative_ideation_and_invention_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_11634_privacy_identity_check_and_harmless_business_inquiry_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_11756_copyrighted_media_permission_and_attribution_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_11813_toxic_social_bias_discriminatory_harm_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_11840_corporate_email_admin_document_text_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_11861_anti_fraud_and_verification_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_12232_general_benign_informational_question_answering_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_1227_illicit_finance_fraud_evasion_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_1636_privacy_intrusion_data_request_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_1728_general_purpose_benign_inquiry_informational_request_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_1742_mundane_creative_editing_and_trivia_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_1748_comparison_framed_social_cultural_difference_questions_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_1923_harmful_persuasion_manipulation_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_1986_violent_sadistic_scene_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_2197_manipulative_abuse_grooming_solicitation_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_2286_email_document_subject_line_extraction_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_2684_corporate_email_attachment_logistics_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_2742_email_document_revision_and_legal_admin_text_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_2827_neutral_procedural_text_with_risky_content_overlap_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_286_email_subject_line_document_summarization_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_2945_internal_business_email_logistics_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_3248_deceptive_harm_fraud_hoaxes_and_misinformation_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_3277_practical_everyday_advice_problem_solving_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_3368_graphic_violent_ritual_injury_descriptions_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_3734_discriminatory_insult_harassment_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_3767_copyright_safe_creative_and_formatting_requests_weak_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_3772_biased_demeaning_requests_about_protected_groups_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_3868_address_and_contact_lookup_requests_for_institutions_and_fictional_entities_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_3895_general_factual_cultural_trivia_questions_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_3915_abuse_harassment_and_humiliating_harm_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_4112_historical_causation_of_genocides_and_revolutions_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_4202_sensitive_info_policy_complaint_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_4205_coping_and_self_improvement_guidance_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_4319_general_benign_informational_queries_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_4396_public_figure_medical_history_inquiries_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_4590_stereotyping_and_generalization_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_4986_ethical_classification_recommendation_task_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_5098_corporate_email_admin_task_requests_weak_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_5449_rumor_seeking_and_secret_sharing_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_5509_grammar_style_transformation_and_neutral_writing_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_603_biased_inferiority_and_illicit_trade_harm_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_6213_email_admin_subject_line_drafting_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_6238_definition_of_risky_terms_queries_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_6499_generic_benign_writing_tasks_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_6562_legitimate_verification_and_administrative_request_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_6733_copyright_defamation_misuse_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_6815_harmful_advocacy_with_deceptive_framing_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_7063_sensitive_social_organizational_explanation_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_7261_protective_safety_legal_compliance_advice_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_728_safety_health_advice_and_benign_information_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_7294_deceptive_malicious_manipulation_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_7480_email_subject_line_and_message_summary_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_7575_factual_summary_explanation_requests_about_benign_media_and_history_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_7897_administrative_verification_request_language_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_7940_defamation_and_fake_news_fabrication_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_7971_fictional_scenario_safety_secure_escape_questions_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_8063_general_factual_cultural_benign_queries_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_8140_email_task_management_correspondence_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_8184_risk_avoidance_verification_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_823_basic_arithmetic_and_sequence_number_pattern_problems_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_8240_trivia_and_classification_questions_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_910_corporate_email_meeting_logistics_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_9267_video_game_combat_action_scenario_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_9558_word_list_sentence_puzzle_prompts_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_9623_misinformation_and_fabricated_narrative_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmful_complied_delta_f_mean_9994_enumerative_benign_business_science_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_Y_mean
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_Y_frac_increasing
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_10000_neutral_factual_general_knowledge_and_email_writing_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_10737_general_benign_informational_explanation_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_1083_trivia_lookup_and_benign_productivity_queries_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_10878_stereotype_based_demeaning_character_portrayals_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_10940_stereotyping_protected_groups_as_inferior_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_11223_competitive_skill_improvement_metaphors_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_11236_degrading_prejudice_toward_marginalized_groups_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_11382_copyright_compliance_and_policy_clarification_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_11404_copyright_derivative_work_permission_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_11495_creative_ideation_and_invention_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_11634_privacy_identity_check_and_harmless_business_inquiry_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_11756_copyrighted_media_permission_and_attribution_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_11813_toxic_social_bias_discriminatory_harm_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_11840_corporate_email_admin_document_text_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_11861_anti_fraud_and_verification_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_12232_general_benign_informational_question_answering_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_1227_illicit_finance_fraud_evasion_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_1636_privacy_intrusion_data_request_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_1728_general_purpose_benign_inquiry_informational_request_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_1742_mundane_creative_editing_and_trivia_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_1748_comparison_framed_social_cultural_difference_questions_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_1923_harmful_persuasion_manipulation_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_1986_violent_sadistic_scene_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_2197_manipulative_abuse_grooming_solicitation_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_2286_email_document_subject_line_extraction_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_2684_corporate_email_attachment_logistics_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_2742_email_document_revision_and_legal_admin_text_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_2827_neutral_procedural_text_with_risky_content_overlap_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_286_email_subject_line_document_summarization_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_2945_internal_business_email_logistics_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_3248_deceptive_harm_fraud_hoaxes_and_misinformation_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_3277_practical_everyday_advice_problem_solving_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_3368_graphic_violent_ritual_injury_descriptions_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_3734_discriminatory_insult_harassment_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_3767_copyright_safe_creative_and_formatting_requests_weak_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_3772_biased_demeaning_requests_about_protected_groups_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_3868_address_and_contact_lookup_requests_for_institutions_and_fictional_entities_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_3895_general_factual_cultural_trivia_questions_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_3915_abuse_harassment_and_humiliating_harm_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_4112_historical_causation_of_genocides_and_revolutions_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_4202_sensitive_info_policy_complaint_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_4205_coping_and_self_improvement_guidance_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_4319_general_benign_informational_queries_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_4396_public_figure_medical_history_inquiries_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_4590_stereotyping_and_generalization_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_4986_ethical_classification_recommendation_task_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_5098_corporate_email_admin_task_requests_weak_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_5449_rumor_seeking_and_secret_sharing_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_5509_grammar_style_transformation_and_neutral_writing_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_603_biased_inferiority_and_illicit_trade_harm_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_6213_email_admin_subject_line_drafting_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_6238_definition_of_risky_terms_queries_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_6499_generic_benign_writing_tasks_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_6562_legitimate_verification_and_administrative_request_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_6733_copyright_defamation_misuse_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_6815_harmful_advocacy_with_deceptive_framing_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_7063_sensitive_social_organizational_explanation_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_7261_protective_safety_legal_compliance_advice_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_728_safety_health_advice_and_benign_information_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_7294_deceptive_malicious_manipulation_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_7480_email_subject_line_and_message_summary_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_7575_factual_summary_explanation_requests_about_benign_media_and_history_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_7897_administrative_verification_request_language_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_7940_defamation_and_fake_news_fabrication_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_7971_fictional_scenario_safety_secure_escape_questions_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_8063_general_factual_cultural_benign_queries_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_8140_email_task_management_correspondence_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_8184_risk_avoidance_verification_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_823_basic_arithmetic_and_sequence_number_pattern_problems_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_8240_trivia_and_classification_questions_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_910_corporate_email_meeting_logistics_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_9267_video_game_combat_action_scenario_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_9558_word_list_sentence_puzzle_prompts_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_9623_misinformation_and_fabricated_narrative_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_refused_delta_f_mean_9994_enumerative_benign_business_science_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_Y_mean
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_Y_frac_increasing
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_10000_neutral_factual_general_knowledge_and_email_writing_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_10737_general_benign_informational_explanation_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_1083_trivia_lookup_and_benign_productivity_queries_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_10878_stereotype_based_demeaning_character_portrayals_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_10940_stereotyping_protected_groups_as_inferior_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_11223_competitive_skill_improvement_metaphors_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_11236_degrading_prejudice_toward_marginalized_groups_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_11382_copyright_compliance_and_policy_clarification_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_11404_copyright_derivative_work_permission_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_11495_creative_ideation_and_invention_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_11634_privacy_identity_check_and_harmless_business_inquiry_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_11756_copyrighted_media_permission_and_attribution_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_11813_toxic_social_bias_discriminatory_harm_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_11840_corporate_email_admin_document_text_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_11861_anti_fraud_and_verification_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_12232_general_benign_informational_question_answering_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_1227_illicit_finance_fraud_evasion_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_1636_privacy_intrusion_data_request_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_1728_general_purpose_benign_inquiry_informational_request_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_1742_mundane_creative_editing_and_trivia_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_1748_comparison_framed_social_cultural_difference_questions_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_1923_harmful_persuasion_manipulation_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_1986_violent_sadistic_scene_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_2197_manipulative_abuse_grooming_solicitation_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_2286_email_document_subject_line_extraction_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_2684_corporate_email_attachment_logistics_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_2742_email_document_revision_and_legal_admin_text_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_2827_neutral_procedural_text_with_risky_content_overlap_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_286_email_subject_line_document_summarization_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_2945_internal_business_email_logistics_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_3248_deceptive_harm_fraud_hoaxes_and_misinformation_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_3277_practical_everyday_advice_problem_solving_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_3368_graphic_violent_ritual_injury_descriptions_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_3734_discriminatory_insult_harassment_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_3767_copyright_safe_creative_and_formatting_requests_weak_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_3772_biased_demeaning_requests_about_protected_groups_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_3868_address_and_contact_lookup_requests_for_institutions_and_fictional_entities_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_3895_general_factual_cultural_trivia_questions_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_3915_abuse_harassment_and_humiliating_harm_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_4112_historical_causation_of_genocides_and_revolutions_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_4202_sensitive_info_policy_complaint_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_4205_coping_and_self_improvement_guidance_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_4319_general_benign_informational_queries_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_4396_public_figure_medical_history_inquiries_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_4590_stereotyping_and_generalization_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_4986_ethical_classification_recommendation_task_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_5098_corporate_email_admin_task_requests_weak_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_5449_rumor_seeking_and_secret_sharing_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_5509_grammar_style_transformation_and_neutral_writing_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_603_biased_inferiority_and_illicit_trade_harm_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_6213_email_admin_subject_line_drafting_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_6238_definition_of_risky_terms_queries_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_6499_generic_benign_writing_tasks_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_6562_legitimate_verification_and_administrative_request_prompts_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_6733_copyright_defamation_misuse_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_6815_harmful_advocacy_with_deceptive_framing_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_7063_sensitive_social_organizational_explanation_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_7261_protective_safety_legal_compliance_advice_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_728_safety_health_advice_and_benign_information_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_7294_deceptive_malicious_manipulation_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_7480_email_subject_line_and_message_summary_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_7575_factual_summary_explanation_requests_about_benign_media_and_history_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_7897_administrative_verification_request_language_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_7940_defamation_and_fake_news_fabrication_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_7971_fictional_scenario_safety_secure_escape_questions_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_8063_general_factual_cultural_benign_queries_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_8140_email_task_management_correspondence_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_8184_risk_avoidance_verification_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_823_basic_arithmetic_and_sequence_number_pattern_problems_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_8240_trivia_and_classification_questions_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_910_corporate_email_meeting_logistics_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_9267_video_game_combat_action_scenario_requests_harmless
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_9558_word_list_sentence_puzzle_prompts_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_9623_misinformation_and_fabricated_narrative_requests_harmful
dtype: float64
- name: harmful_harmless_seg_harmless_complied_delta_f_mean_9994_enumerative_benign_business_science_prompts_harmless
dtype: float64
- name: appendix_seed0_seg_all_delta_Y_mean
dtype: float64
- name: appendix_seed0_seg_all_delta_Y_frac_increasing
dtype: float64
- name: appendix_seed0_seg_harmful_refused_delta_Y_mean
dtype: float64
- name: appendix_seed0_seg_harmful_refused_delta_Y_frac_increasing
dtype: float64
- name: appendix_seed0_seg_harmful_complied_delta_Y_mean
dtype: float64
- name: appendix_seed0_seg_harmful_complied_delta_Y_frac_increasing
dtype: float64
- name: appendix_seed1_seg_all_delta_Y_mean
dtype: float64
- name: appendix_seed1_seg_all_delta_Y_frac_increasing
dtype: float64
- name: appendix_seed1_seg_harmful_refused_delta_Y_mean
dtype: float64
- name: appendix_seed1_seg_harmful_refused_delta_Y_frac_increasing
dtype: float64
- name: appendix_seed1_seg_harmful_complied_delta_Y_mean
dtype: float64
- name: appendix_seed1_seg_harmful_complied_delta_Y_frac_increasing
dtype: float64
- name: appendix_seed2_seg_all_delta_Y_mean
dtype: float64
- name: appendix_seed2_seg_all_delta_Y_frac_increasing
dtype: float64
- name: appendix_seed2_seg_harmful_refused_delta_Y_mean
dtype: float64
- name: appendix_seed2_seg_harmful_refused_delta_Y_frac_increasing
dtype: float64
- name: appendix_seed2_seg_harmful_complied_delta_Y_mean
dtype: float64
- name: appendix_seed2_seg_harmful_complied_delta_Y_frac_increasing
dtype: float64
splits:
- name: train
num_bytes: 1052446705
num_examples: 220
download_size: 1062578491
dataset_size: 1052446705
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
数据集信息:
特征列表:
- 名称:idx
数据类型:64位整型
- 名称:prompt
数据类型:字符串
- 名称:label
数据类型:字符串
- 名称:source
数据类型:字符串
- 名称:response
数据类型:字符串
- 名称:grad_norm
数据类型:64位浮点型
- 名称:delta_Y_mean
数据类型:64位浮点型
- 名称:delta_Y_per_prompt
数据类型:列表(元素为64位浮点型)
- 名称:delta_Y_frac_increasing
数据类型:64位浮点型
- 名称:delta_h10_mean_norm
数据类型:64位浮点型
- 名称:delta_f_per_prompt
数据类型:结构体,包含以下子字段:
- 名称:10000_neutral_factual_general_knowledge_and_email_writing_prompts_harmless(10000_中立事实类通用知识与邮件写作提示(无害))
数据类型:列表(元素为64位浮点型)
- 名称:10737_general_benign_informational_explanation_requests_harmless(10737_通用良性信息解释请求提示(无害))
数据类型:列表(元素为64位浮点型)
- 名称:1083_trivia_lookup_and_benign_productivity_queries_harmless(1083_常识查询与良性生产力请求提示(无害))
数据类型:列表(元素为64位浮点型)
- 名称:10878_stereotype_based_demeaning_character_portrayals_harmful(10878_基于刻板印象的贬低性角色刻画提示(有害))
数据类型:列表(元素为64位浮点型)
- 名称:10940_stereotyping_protected_groups_as_inferior_harmful(10940_将受保护群体刻板印象化为低人一等的提示(有害))
数据类型:列表(元素为64位浮点型)
……(其余子字段遵循相同翻译逻辑,保留数字ID与无害/有害标注)
- 名称:harmful_natural_refusal_influence
数据类型:64位浮点型
- 名称:harmful_natural_influence_10000_neutral_factual_general_knowledge_and_email_writing_prompts_harmless
数据类型:64位浮点型(有害自然场景下,10000_中立事实类通用知识与邮件写作提示(无害)子集的影响值)
- 名称:harmful_natural_influence_10737_general_benign_informational_explanation_requests_harmless
数据类型:64位浮点型(有害自然场景下,10737_通用良性信息解释请求提示(无害)子集的影响值)
……(其余harmful_natural_influence系列字段遵循相同翻译逻辑)
- 名称:harmful_natural_top5_most_influenced
数据类型:列表,包含子字段:
- 名称:feature
数据类型:字符串
- 名称:influence
数据类型:64位浮点型
- 名称:ridge_weight
数据类型:64位浮点型
- 名称:harmful_natural_top5_most_important
数据类型:列表,子字段同上
- 名称:harmful_balanced_refusal_influence
数据类型:64位浮点型
……(harmful_balanced系列字段、harmful_harmless系列字段、附录字段、数据分割与配置项均遵循上述翻译逻辑,完整保留原始字段标识与格式)
数据分割:
- 名称:训练集(train)
字节数:1052446705
样本数量:220
下载大小:1062578491
数据集总大小:1052446705
配置:
- 配置名称:default
数据文件:
- 分割方式:训练集(train)
路径:data/train-*
提供机构:
DarianNLP



