DarianNLP/mda_influence_scores
收藏Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/DarianNLP/mda_influence_scores
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: idx
dtype: int64
- name: prompt
dtype: string
- name: label
dtype: string
- name: source
dtype: string
- name: grad_norm
dtype: float64
- name: response
dtype: string
- name: refusal_influence
dtype: float64
- name: influence_feat_10000_direct_inquiries_and_tasks_harmless
dtype: float64
- name: influence_feat_10737_explaining_concepts_or_skills_harmless
dtype: float64
- name: influence_feat_1083_direct_problem_solving_queries_harmless
dtype: float64
- name: influence_feat_10878_directed_fictional_content_generation_harmful
dtype: float64
- name: influence_feat_10940_request_for_negative_group_stereotypes_harmful
dtype: float64
- name: influence_feat_11007_requests_for_sensitive_details_harmful
dtype: float64
- name: influence_feat_11223_metaphorical_aggression_for_positive_goals_harmless
dtype: float64
- name: influence_feat_11236_prompting_group_denigration_harmful
dtype: float64
- name: influence_feat_11382_factual_and_explanatory_content_requests_harmless
dtype: float64
- name: influence_feat_11430_harmful_prescriptive_medical_advice_harmful
dtype: float64
- name: influence_feat_11634_specific_information_content_requests_harmless
dtype: float64
- name: influence_feat_11761_specific_instructional_prompts_harmful
dtype: float64
- name: influence_feat_11861_direct_task_information_request_harmless
dtype: float64
- name: influence_feat_12102_structured_information_request_harmful
dtype: float64
- name: influence_feat_12232_specific_information_procedure_requests_harmless
dtype: float64
- name: influence_feat_1227_harmful_framing_or_instruction_harmful
dtype: float64
- name: influence_feat_1581_potentially_problematic_information_requests_harmful
dtype: float64
- name: influence_feat_1636_requesting_sensitive_private_data_harmful
dtype: float64
- name: influence_feat_1728_task_oriented_communication_processing_harmless
dtype: float64
- name: influence_feat_1742_general_assistant_tasks_harmless
dtype: float64
- name: influence_feat_1836_benign_informational_analytical_requests_harmless
dtype: float64
- name: influence_feat_2197_requests_for_exploitation_tactics_harmful
dtype: float64
- name: influence_feat_2827_language_requiring_scrutiny_harmless
dtype: float64
- name: influence_feat_286_corporate_contractual_financial_specifics_harmless
dtype: float64
- name: influence_feat_2945_harmless_procedural_technical_tasks_harmless
dtype: float64
- name: influence_feat_3183_instructional_text_operations_harmless
dtype: float64
- name: influence_feat_3248_crafting_deceptive_exploitative_content_harmful
dtype: float64
- name: influence_feat_3277_knowledge_and_explanation_queries_harmless
dtype: float64
- name: influence_feat_3368_guidance_for_self_improvement_understanding_harmless
dtype: float64
- name: influence_feat_3536_concise_information_extraction_harmless
dtype: float64
- name: influence_feat_3767_explicit_content_boundary_requests_harmless
dtype: float64
- name: influence_feat_3772_facilitating_misguidance_or_illicit_acts_harmful
dtype: float64
- name: influence_feat_3868_request_for_public_address_harmless
dtype: float64
- name: influence_feat_3895_direct_factual_retrieval_task_harmless
dtype: float64
- name: influence_feat_3915_harmful_instructional_content_harmful
dtype: float64
- name: influence_feat_4027_enumeration_and_list_prompts_harmless
dtype: float64
- name: influence_feat_4112_serious_factual_inquiry_harmless
dtype: float64
- name: influence_feat_4202_communication_&_information_security_harmless
dtype: float64
- name: influence_feat_4205_personal_difficulties_and_coping_harmless
dtype: float64
- name: influence_feat_4211_generating_group_based_discrimination_harmful
dtype: float64
- name: influence_feat_4319_specific,_direct_inquiry_instruction_harmless
dtype: float64
- name: influence_feat_4396_publicly_disclosed_personal_health_harmful
dtype: float64
- name: influence_feat_4528_organizational_risk_communications_harmful
dtype: float64
- name: influence_feat_4590_soliciting_discriminatory_narratives_harmless
dtype: float64
- name: influence_feat_4644_specific_information_retrieval_harmless
dtype: float64
- name: influence_feat_4991_conspiracy_theory_generation_promotion_harmful
dtype: float64
- name: influence_feat_5449_request_for_email_subject_lines_harmful
dtype: float64
- name: influence_feat_5505_email_subject_line_task_harmless
dtype: float64
- name: influence_feat_5509_simple_linguistic_task_prompts_harmless
dtype: float64
- name: influence_feat_573_specific_information_or_action_request_harmful
dtype: float64
- name: influence_feat_6090_information_management_and_analysis_harmless
dtype: float64
- name: influence_feat_6238_factual_definition_queries_harmless
dtype: float64
- name: influence_feat_6469_academic_advocacy_content_task_harmless
dtype: float64
- name: influence_feat_6499_constructive_guidance_seeking_harmless
dtype: float64
- name: influence_feat_6562_direct_task_information_seeking_harmless
dtype: float64
- name: influence_feat_6733_content_&_information_integrity_harmful
dtype: float64
- name: influence_feat_6815_generating_influential_content_harmful
dtype: float64
- name: influence_feat_7063_directive_explanations_&_creations_harmless
dtype: float64
- name: influence_feat_7100_requests_for_specific_actionable_data_harmful
dtype: float64
- name: influence_feat_7261_seeking_safety_and_protection_guidance_harmless
dtype: float64
- name: influence_feat_7294_propagating_harmful_or_fraudulent_actions_harmful
dtype: float64
- name: influence_feat_7480_email_subject_line_query_harmless
dtype: float64
- name: influence_feat_7565_instructional_content_generation_harmless
dtype: float64
- name: influence_feat_7575_explicit_information_requests_harmless
dtype: float64
- name: influence_feat_7677_constrained_text_generation_instruction_harmless
dtype: float64
- name: influence_feat_7897_action_oriented_practical_queries_harmless
dtype: float64
- name: influence_feat_7940_strategic_deception_or_corporate_manipulation_harmful
dtype: float64
- name: influence_feat_7971_factual_entity_information_queries_harmless
dtype: float64
- name: influence_feat_8063_noteworthy_entities'_positive_ethical_aspects_harmless
dtype: float64
- name: influence_feat_808_promoting_group_discrimination_by_traits_harmful
dtype: float64
- name: influence_feat_8193_detailed_content_generation_requests_harmful
dtype: float64
- name: influence_feat_823_instructions_for_mathematical_tasks_harmless
dtype: float64
- name: influence_feat_8240_explicit_structured_information_tasks_harmless
dtype: float64
- name: influence_feat_8643_neutral_framed_content_tasks_harmless
dtype: float64
- name: influence_feat_9081_managing_sensitive_critical_information_harmful
dtype: float64
- name: influence_feat_910_email_subject_line_task_harmless
dtype: float64
- name: influence_feat_9267_contextually_harmless_fictional_scenarios_harmless
dtype: float64
- name: influence_feat_9623_disinformation_for_manipulation_harmful
dtype: float64
- name: influence_feat_9649_**soliciting_harmful_content_creation**_harmful
dtype: float64
- name: influence_feat_9994_requests_for_n_item_lists_harmless
dtype: float64
- name: top1_feature
dtype: string
- name: top1_influence
dtype: float64
- name: top2_feature
dtype: string
- name: top2_influence
dtype: float64
- name: top3_feature
dtype: string
- name: top3_influence
dtype: float64
- name: top4_feature
dtype: string
- name: top4_influence
dtype: float64
- name: top5_feature
dtype: string
- name: top5_influence
dtype: float64
- name: top5_most_influenced
list:
- name: feature
dtype: string
- name: influence
dtype: float64
- name: ridge_weight
dtype: float64
- name: top5_most_important
list:
- name: feature
dtype: string
- name: influence
dtype: float64
- name: ridge_weight
dtype: float64
splits:
- name: train
num_bytes: 436477
num_examples: 186
download_size: 298495
dataset_size: 436477
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
提供机构:
DarianNLP



