DarianNLP/final_sae_refusal_dataset_NEW
收藏Hugging Face2026-04-14 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/DarianNLP/final_sae_refusal_dataset_NEW
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: prompt
dtype: string
- name: llama_response
dtype: string
- name: category
dtype: string
- name: reason
dtype: string
- name: refused
dtype: int64
- name: prompt_label
dtype: string
- name: source
dtype: string
- name: feat_3915_abuse_harassment_and_humiliating_harm_requests_harmful
dtype: float64
- name: feat_9623_misinformation_and_fabricated_narrative_requests_harmful
dtype: float64
- name: feat_1636_privacy_intrusion_data_request_harmful
dtype: float64
- name: feat_10940_stereotyping_protected_groups_as_inferior_harmful
dtype: float64
- name: feat_10878_stereotype_based_demeaning_character_portrayals_harmful
dtype: float64
- name: feat_3772_biased_demeaning_requests_about_protected_groups_harmful
dtype: float64
- name: feat_6733_copyright_defamation_misuse_requests_harmful
dtype: float64
- name: feat_5098_corporate_email_admin_task_requests_weak_harmful
dtype: float64
- name: feat_1986_violent_sadistic_scene_requests_harmful
dtype: float64
- name: feat_603_biased_inferiority_and_illicit_trade_harm_requests_harmful
dtype: float64
- name: feat_11236_degrading_prejudice_toward_marginalized_groups_harmful
dtype: float64
- name: feat_11840_corporate_email_admin_document_text_harmful
dtype: float64
- name: feat_2197_manipulative_abuse_grooming_solicitation_harmful
dtype: float64
- name: feat_11404_copyright_derivative_work_permission_requests_harmful
dtype: float64
- name: feat_1227_illicit_finance_fraud_evasion_requests_harmful
dtype: float64
- name: feat_5449_rumor_seeking_and_secret_sharing_requests_harmful
dtype: float64
- name: feat_9558_word_list_sentence_puzzle_prompts_harmful
dtype: float64
- name: feat_3248_deceptive_harm_fraud_hoaxes_and_misinformation_harmful
dtype: float64
- name: feat_3734_discriminatory_insult_harassment_requests_harmful
dtype: float64
- name: feat_2742_email_document_revision_and_legal_admin_text_harmful
dtype: float64
- name: feat_2684_corporate_email_attachment_logistics_harmful
dtype: float64
- name: feat_7294_deceptive_malicious_manipulation_requests_harmful
dtype: float64
- name: feat_6815_harmful_advocacy_with_deceptive_framing_harmful
dtype: float64
- name: feat_7940_defamation_and_fake_news_fabrication_requests_harmful
dtype: float64
- name: feat_1923_harmful_persuasion_manipulation_requests_harmful
dtype: float64
- name: feat_3368_graphic_violent_ritual_injury_descriptions_harmless
dtype: float64
- name: feat_6238_definition_of_risky_terms_queries_harmless
dtype: float64
- name: feat_7971_fictional_scenario_safety_secure_escape_questions_harmless
dtype: float64
- name: feat_7261_protective_safety_legal_compliance_advice_harmless
dtype: float64
- name: feat_286_email_subject_line_document_summarization_requests_harmless
dtype: float64
- name: feat_11223_competitive_skill_improvement_metaphors_harmless
dtype: float64
- name: feat_9994_enumerative_benign_business_science_prompts_harmless
dtype: float64
- name: feat_11382_copyright_compliance_and_policy_clarification_requests_harmless
dtype: float64
- name: feat_4112_historical_causation_of_genocides_and_revolutions_harmless
dtype: float64
- name: feat_1742_mundane_creative_editing_and_trivia_requests_harmless
dtype: float64
- name: feat_4205_coping_and_self_improvement_guidance_harmless
dtype: float64
- name: feat_7063_sensitive_social_organizational_explanation_requests_harmless
dtype: float64
- name: feat_8063_general_factual_cultural_benign_queries_harmless
dtype: float64
- name: feat_4590_stereotyping_and_generalization_prompts_harmless
dtype: float64
- name: feat_10000_neutral_factual_general_knowledge_and_email_writing_prompts_harmless
dtype: float64
- name: feat_7575_factual_summary_explanation_requests_about_benign_media_and_history_harmless
dtype: float64
- name: feat_5509_grammar_style_transformation_and_neutral_writing_prompts_harmless
dtype: float64
- name: feat_3895_general_factual_cultural_trivia_questions_harmless
dtype: float64
- name: feat_6499_generic_benign_writing_tasks_harmless
dtype: float64
- name: feat_11495_creative_ideation_and_invention_prompts_harmless
dtype: float64
- name: feat_2945_internal_business_email_logistics_harmless
dtype: float64
- name: feat_910_corporate_email_meeting_logistics_harmless
dtype: float64
- name: feat_4319_general_benign_informational_queries_harmless
dtype: float64
- name: feat_11813_toxic_social_bias_discriminatory_harm_requests_harmless
dtype: float64
- name: feat_3277_practical_everyday_advice_problem_solving_requests_harmless
dtype: float64
- name: feat_4396_public_figure_medical_history_inquiries_harmless
dtype: float64
- name: feat_7480_email_subject_line_and_message_summary_requests_harmless
dtype: float64
- name: feat_3767_copyright_safe_creative_and_formatting_requests_weak_harmless
dtype: float64
- name: feat_728_safety_health_advice_and_benign_information_requests_harmless
dtype: float64
- name: feat_6213_email_admin_subject_line_drafting_harmless
dtype: float64
- name: feat_9267_video_game_combat_action_scenario_requests_harmless
dtype: float64
- name: feat_10737_general_benign_informational_explanation_requests_harmless
dtype: float64
- name: feat_4986_ethical_classification_recommendation_task_prompts_harmless
dtype: float64
- name: feat_4202_sensitive_info_policy_complaint_requests_harmless
dtype: float64
- name: feat_1083_trivia_lookup_and_benign_productivity_queries_harmless
dtype: float64
- name: feat_11756_copyrighted_media_permission_and_attribution_requests_harmless
dtype: float64
- name: feat_11861_anti_fraud_and_verification_requests_harmless
dtype: float64
- name: feat_6562_legitimate_verification_and_administrative_request_prompts_harmless
dtype: float64
- name: feat_1728_general_purpose_benign_inquiry_informational_request_harmless
dtype: float64
- name: feat_823_basic_arithmetic_and_sequence_number_pattern_problems_harmless
dtype: float64
- name: feat_3868_address_and_contact_lookup_requests_for_institutions_and_fictional_entities_harmless
dtype: float64
- name: feat_7897_administrative_verification_request_language_harmless
dtype: float64
- name: feat_2286_email_document_subject_line_extraction_harmless
dtype: float64
- name: feat_12232_general_benign_informational_question_answering_harmless
dtype: float64
- name: feat_8184_risk_avoidance_verification_requests_harmless
dtype: float64
- name: feat_8240_trivia_and_classification_questions_harmless
dtype: float64
- name: feat_11634_privacy_identity_check_and_harmless_business_inquiry_requests_harmless
dtype: float64
- name: feat_1748_comparison_framed_social_cultural_difference_questions_harmless
dtype: float64
- name: feat_2827_neutral_procedural_text_with_risky_content_overlap_harmless
dtype: float64
- name: feat_8140_email_task_management_correspondence_harmless
dtype: float64
splits:
- name: train
num_bytes: 17549410
num_examples: 11000
download_size: 9687095
dataset_size: 17549410
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
提供机构:
DarianNLP



