five

DarianNLP/sae_feature_activations_NEW

收藏
Hugging Face2026-04-14 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/DarianNLP/sae_feature_activations_NEW
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: text dtype: string - name: label dtype: string - name: source dtype: string - name: feat_3915_abuse_harassment_and_humiliating_harm_requests_harmful dtype: float64 - name: feat_9623_misinformation_and_fabricated_narrative_requests_harmful dtype: float64 - name: feat_1636_privacy_intrusion_data_request_harmful dtype: float64 - name: feat_10940_stereotyping_protected_groups_as_inferior_harmful dtype: float64 - name: feat_10878_stereotype_based_demeaning_character_portrayals_harmful dtype: float64 - name: feat_3772_biased_demeaning_requests_about_protected_groups_harmful dtype: float64 - name: feat_6733_copyright_defamation_misuse_requests_harmful dtype: float64 - name: feat_5098_corporate_email_admin_task_requests_weak_harmful dtype: float64 - name: feat_1986_violent_sadistic_scene_requests_harmful dtype: float64 - name: feat_603_biased_inferiority_and_illicit_trade_harm_requests_harmful dtype: float64 - name: feat_11236_degrading_prejudice_toward_marginalized_groups_harmful dtype: float64 - name: feat_11840_corporate_email_admin_document_text_harmful dtype: float64 - name: feat_2197_manipulative_abuse_grooming_solicitation_harmful dtype: float64 - name: feat_11404_copyright_derivative_work_permission_requests_harmful dtype: float64 - name: feat_1227_illicit_finance_fraud_evasion_requests_harmful dtype: float64 - name: feat_5449_rumor_seeking_and_secret_sharing_requests_harmful dtype: float64 - name: feat_9558_word_list_sentence_puzzle_prompts_harmful dtype: float64 - name: feat_3248_deceptive_harm_fraud_hoaxes_and_misinformation_harmful dtype: float64 - name: feat_3734_discriminatory_insult_harassment_requests_harmful dtype: float64 - name: feat_2742_email_document_revision_and_legal_admin_text_harmful dtype: float64 - name: feat_2684_corporate_email_attachment_logistics_harmful dtype: float64 - name: feat_7294_deceptive_malicious_manipulation_requests_harmful dtype: float64 - name: feat_6815_harmful_advocacy_with_deceptive_framing_harmful dtype: float64 - name: feat_7940_defamation_and_fake_news_fabrication_requests_harmful dtype: float64 - name: feat_1923_harmful_persuasion_manipulation_requests_harmful dtype: float64 - name: feat_3368_graphic_violent_ritual_injury_descriptions_harmless dtype: float64 - name: feat_6238_definition_of_risky_terms_queries_harmless dtype: float64 - name: feat_7971_fictional_scenario_safety_secure_escape_questions_harmless dtype: float64 - name: feat_7261_protective_safety_legal_compliance_advice_harmless dtype: float64 - name: feat_286_email_subject_line_document_summarization_requests_harmless dtype: float64 - name: feat_11223_competitive_skill_improvement_metaphors_harmless dtype: float64 - name: feat_9994_enumerative_benign_business_science_prompts_harmless dtype: float64 - name: feat_11382_copyright_compliance_and_policy_clarification_requests_harmless dtype: float64 - name: feat_4112_historical_causation_of_genocides_and_revolutions_harmless dtype: float64 - name: feat_1742_mundane_creative_editing_and_trivia_requests_harmless dtype: float64 - name: feat_4205_coping_and_self_improvement_guidance_harmless dtype: float64 - name: feat_7063_sensitive_social_organizational_explanation_requests_harmless dtype: float64 - name: feat_8063_general_factual_cultural_benign_queries_harmless dtype: float64 - name: feat_4590_stereotyping_and_generalization_prompts_harmless dtype: float64 - name: feat_10000_neutral_factual_general_knowledge_and_email_writing_prompts_harmless dtype: float64 - name: feat_7575_factual_summary_explanation_requests_about_benign_media_and_history_harmless dtype: float64 - name: feat_5509_grammar_style_transformation_and_neutral_writing_prompts_harmless dtype: float64 - name: feat_3895_general_factual_cultural_trivia_questions_harmless dtype: float64 - name: feat_6499_generic_benign_writing_tasks_harmless dtype: float64 - name: feat_11495_creative_ideation_and_invention_prompts_harmless dtype: float64 - name: feat_2945_internal_business_email_logistics_harmless dtype: float64 - name: feat_910_corporate_email_meeting_logistics_harmless dtype: float64 - name: feat_4319_general_benign_informational_queries_harmless dtype: float64 - name: feat_11813_toxic_social_bias_discriminatory_harm_requests_harmless dtype: float64 - name: feat_3277_practical_everyday_advice_problem_solving_requests_harmless dtype: float64 - name: feat_4396_public_figure_medical_history_inquiries_harmless dtype: float64 - name: feat_7480_email_subject_line_and_message_summary_requests_harmless dtype: float64 - name: feat_3767_copyright_safe_creative_and_formatting_requests_weak_harmless dtype: float64 - name: feat_728_safety_health_advice_and_benign_information_requests_harmless dtype: float64 - name: feat_6213_email_admin_subject_line_drafting_harmless dtype: float64 - name: feat_9267_video_game_combat_action_scenario_requests_harmless dtype: float64 - name: feat_10737_general_benign_informational_explanation_requests_harmless dtype: float64 - name: feat_4986_ethical_classification_recommendation_task_prompts_harmless dtype: float64 - name: feat_4202_sensitive_info_policy_complaint_requests_harmless dtype: float64 - name: feat_1083_trivia_lookup_and_benign_productivity_queries_harmless dtype: float64 - name: feat_11756_copyrighted_media_permission_and_attribution_requests_harmless dtype: float64 - name: feat_11861_anti_fraud_and_verification_requests_harmless dtype: float64 - name: feat_6562_legitimate_verification_and_administrative_request_prompts_harmless dtype: float64 - name: feat_1728_general_purpose_benign_inquiry_informational_request_harmless dtype: float64 - name: feat_823_basic_arithmetic_and_sequence_number_pattern_problems_harmless dtype: float64 - name: feat_3868_address_and_contact_lookup_requests_for_institutions_and_fictional_entities_harmless dtype: float64 - name: feat_7897_administrative_verification_request_language_harmless dtype: float64 - name: feat_2286_email_document_subject_line_extraction_harmless dtype: float64 - name: feat_12232_general_benign_informational_question_answering_harmless dtype: float64 - name: feat_8184_risk_avoidance_verification_requests_harmless dtype: float64 - name: feat_8240_trivia_and_classification_questions_harmless dtype: float64 - name: feat_11634_privacy_identity_check_and_harmless_business_inquiry_requests_harmless dtype: float64 - name: feat_1748_comparison_framed_social_cultural_difference_questions_harmless dtype: float64 - name: feat_2827_neutral_procedural_text_with_risky_content_overlap_harmless dtype: float64 - name: feat_8140_email_task_management_correspondence_harmless dtype: float64 splits: - name: train num_bytes: 8170606 num_examples: 11000 download_size: 5110354 dataset_size: 8170606 configs: - config_name: default data_files: - split: train path: data/train-* ---
提供机构:
DarianNLP
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作