DarianNLP/mda_influence_scores

Name: DarianNLP/mda_influence_scores
Creator: DarianNLP
Published: 2026-03-26 10:46:49
License: 暂无描述

Hugging Face2026-03-26 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/DarianNLP/mda_influence_scores

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: idx dtype: int64 - name: prompt dtype: string - name: label dtype: string - name: source dtype: string - name: grad_norm dtype: float64 - name: response dtype: string - name: refusal_influence dtype: float64 - name: influence_feat_10000_direct_inquiries_and_tasks_harmless dtype: float64 - name: influence_feat_10737_explaining_concepts_or_skills_harmless dtype: float64 - name: influence_feat_1083_direct_problem_solving_queries_harmless dtype: float64 - name: influence_feat_10878_directed_fictional_content_generation_harmful dtype: float64 - name: influence_feat_10940_request_for_negative_group_stereotypes_harmful dtype: float64 - name: influence_feat_11007_requests_for_sensitive_details_harmful dtype: float64 - name: influence_feat_11223_metaphorical_aggression_for_positive_goals_harmless dtype: float64 - name: influence_feat_11236_prompting_group_denigration_harmful dtype: float64 - name: influence_feat_11382_factual_and_explanatory_content_requests_harmless dtype: float64 - name: influence_feat_11430_harmful_prescriptive_medical_advice_harmful dtype: float64 - name: influence_feat_11634_specific_information_content_requests_harmless dtype: float64 - name: influence_feat_11761_specific_instructional_prompts_harmful dtype: float64 - name: influence_feat_11861_direct_task_information_request_harmless dtype: float64 - name: influence_feat_12102_structured_information_request_harmful dtype: float64 - name: influence_feat_12232_specific_information_procedure_requests_harmless dtype: float64 - name: influence_feat_1227_harmful_framing_or_instruction_harmful dtype: float64 - name: influence_feat_1581_potentially_problematic_information_requests_harmful dtype: float64 - name: influence_feat_1636_requesting_sensitive_private_data_harmful dtype: float64 - name: influence_feat_1728_task_oriented_communication_processing_harmless dtype: float64 - name: influence_feat_1742_general_assistant_tasks_harmless dtype: float64 - name: influence_feat_1836_benign_informational_analytical_requests_harmless dtype: float64 - name: influence_feat_2197_requests_for_exploitation_tactics_harmful dtype: float64 - name: influence_feat_2827_language_requiring_scrutiny_harmless dtype: float64 - name: influence_feat_286_corporate_contractual_financial_specifics_harmless dtype: float64 - name: influence_feat_2945_harmless_procedural_technical_tasks_harmless dtype: float64 - name: influence_feat_3183_instructional_text_operations_harmless dtype: float64 - name: influence_feat_3248_crafting_deceptive_exploitative_content_harmful dtype: float64 - name: influence_feat_3277_knowledge_and_explanation_queries_harmless dtype: float64 - name: influence_feat_3368_guidance_for_self_improvement_understanding_harmless dtype: float64 - name: influence_feat_3536_concise_information_extraction_harmless dtype: float64 - name: influence_feat_3767_explicit_content_boundary_requests_harmless dtype: float64 - name: influence_feat_3772_facilitating_misguidance_or_illicit_acts_harmful dtype: float64 - name: influence_feat_3868_request_for_public_address_harmless dtype: float64 - name: influence_feat_3895_direct_factual_retrieval_task_harmless dtype: float64 - name: influence_feat_3915_harmful_instructional_content_harmful dtype: float64 - name: influence_feat_4027_enumeration_and_list_prompts_harmless dtype: float64 - name: influence_feat_4112_serious_factual_inquiry_harmless dtype: float64 - name: influence_feat_4202_communication_&_information_security_harmless dtype: float64 - name: influence_feat_4205_personal_difficulties_and_coping_harmless dtype: float64 - name: influence_feat_4211_generating_group_based_discrimination_harmful dtype: float64 - name: influence_feat_4319_specific,_direct_inquiry_instruction_harmless dtype: float64 - name: influence_feat_4396_publicly_disclosed_personal_health_harmful dtype: float64 - name: influence_feat_4528_organizational_risk_communications_harmful dtype: float64 - name: influence_feat_4590_soliciting_discriminatory_narratives_harmless dtype: float64 - name: influence_feat_4644_specific_information_retrieval_harmless dtype: float64 - name: influence_feat_4991_conspiracy_theory_generation_promotion_harmful dtype: float64 - name: influence_feat_5449_request_for_email_subject_lines_harmful dtype: float64 - name: influence_feat_5505_email_subject_line_task_harmless dtype: float64 - name: influence_feat_5509_simple_linguistic_task_prompts_harmless dtype: float64 - name: influence_feat_573_specific_information_or_action_request_harmful dtype: float64 - name: influence_feat_6090_information_management_and_analysis_harmless dtype: float64 - name: influence_feat_6238_factual_definition_queries_harmless dtype: float64 - name: influence_feat_6469_academic_advocacy_content_task_harmless dtype: float64 - name: influence_feat_6499_constructive_guidance_seeking_harmless dtype: float64 - name: influence_feat_6562_direct_task_information_seeking_harmless dtype: float64 - name: influence_feat_6733_content_&_information_integrity_harmful dtype: float64 - name: influence_feat_6815_generating_influential_content_harmful dtype: float64 - name: influence_feat_7063_directive_explanations_&_creations_harmless dtype: float64 - name: influence_feat_7100_requests_for_specific_actionable_data_harmful dtype: float64 - name: influence_feat_7261_seeking_safety_and_protection_guidance_harmless dtype: float64 - name: influence_feat_7294_propagating_harmful_or_fraudulent_actions_harmful dtype: float64 - name: influence_feat_7480_email_subject_line_query_harmless dtype: float64 - name: influence_feat_7565_instructional_content_generation_harmless dtype: float64 - name: influence_feat_7575_explicit_information_requests_harmless dtype: float64 - name: influence_feat_7677_constrained_text_generation_instruction_harmless dtype: float64 - name: influence_feat_7897_action_oriented_practical_queries_harmless dtype: float64 - name: influence_feat_7940_strategic_deception_or_corporate_manipulation_harmful dtype: float64 - name: influence_feat_7971_factual_entity_information_queries_harmless dtype: float64 - name: influence_feat_8063_noteworthy_entities'_positive_ethical_aspects_harmless dtype: float64 - name: influence_feat_808_promoting_group_discrimination_by_traits_harmful dtype: float64 - name: influence_feat_8193_detailed_content_generation_requests_harmful dtype: float64 - name: influence_feat_823_instructions_for_mathematical_tasks_harmless dtype: float64 - name: influence_feat_8240_explicit_structured_information_tasks_harmless dtype: float64 - name: influence_feat_8643_neutral_framed_content_tasks_harmless dtype: float64 - name: influence_feat_9081_managing_sensitive_critical_information_harmful dtype: float64 - name: influence_feat_910_email_subject_line_task_harmless dtype: float64 - name: influence_feat_9267_contextually_harmless_fictional_scenarios_harmless dtype: float64 - name: influence_feat_9623_disinformation_for_manipulation_harmful dtype: float64 - name: influence_feat_9649_**soliciting_harmful_content_creation**_harmful dtype: float64 - name: influence_feat_9994_requests_for_n_item_lists_harmless dtype: float64 - name: top1_feature dtype: string - name: top1_influence dtype: float64 - name: top2_feature dtype: string - name: top2_influence dtype: float64 - name: top3_feature dtype: string - name: top3_influence dtype: float64 - name: top4_feature dtype: string - name: top4_influence dtype: float64 - name: top5_feature dtype: string - name: top5_influence dtype: float64 - name: top5_most_influenced list: - name: feature dtype: string - name: influence dtype: float64 - name: ridge_weight dtype: float64 - name: top5_most_important list: - name: feature dtype: string - name: influence dtype: float64 - name: ridge_weight dtype: float64 splits: - name: train num_bytes: 436477 num_examples: 186 download_size: 298495 dataset_size: 436477 configs: - config_name: default data_files: - split: train path: data/train-* ---

提供机构：

DarianNLP

5,000+

优质数据集

54 个

任务类型

进入经典数据集