DarianNLP/logprob_mda_dataset
收藏Hugging Face2026-03-20 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/DarianNLP/logprob_mda_dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: prompt
dtype: string
- name: llama_response
dtype: string
- name: refused
dtype: int64
- name: prompt_label
dtype: string
- name: source
dtype: string
- name: refusal_logprob
dtype: float64
- name: feat_10000_direct_inquiries_and_tasks_harmless
dtype: float64
- name: feat_10737_explaining_concepts_or_skills_harmless
dtype: float64
- name: feat_1083_direct_problem_solving_queries_harmless
dtype: float64
- name: feat_10878_directed_fictional_content_generation_harmful
dtype: float64
- name: feat_10940_request_for_negative_group_stereotypes_harmful
dtype: float64
- name: feat_11007_requests_for_sensitive_details_harmful
dtype: float64
- name: feat_11223_metaphorical_aggression_for_positive_goals_harmless
dtype: float64
- name: feat_11236_prompting_group_denigration_harmful
dtype: float64
- name: feat_11382_factual_and_explanatory_content_requests_harmless
dtype: float64
- name: feat_11430_harmful_prescriptive_medical_advice_harmful
dtype: float64
- name: feat_11634_specific_information_content_requests_harmless
dtype: float64
- name: feat_11761_specific_instructional_prompts_harmful
dtype: float64
- name: feat_11861_direct_task_information_request_harmless
dtype: float64
- name: feat_12102_structured_information_request_harmful
dtype: float64
- name: feat_12232_specific_information_procedure_requests_harmless
dtype: float64
- name: feat_1227_harmful_framing_or_instruction_harmful
dtype: float64
- name: feat_1581_potentially_problematic_information_requests_harmful
dtype: float64
- name: feat_1636_requesting_sensitive_private_data_harmful
dtype: float64
- name: feat_1728_task_oriented_communication_processing_harmless
dtype: float64
- name: feat_1742_general_assistant_tasks_harmless
dtype: float64
- name: feat_1836_benign_informational_analytical_requests_harmless
dtype: float64
- name: feat_2197_requests_for_exploitation_tactics_harmful
dtype: float64
- name: feat_2827_language_requiring_scrutiny_harmless
dtype: float64
- name: feat_286_corporate_contractual_financial_specifics_harmless
dtype: float64
- name: feat_2945_harmless_procedural_technical_tasks_harmless
dtype: float64
- name: feat_3183_instructional_text_operations_harmless
dtype: float64
- name: feat_3248_crafting_deceptive_exploitative_content_harmful
dtype: float64
- name: feat_3277_knowledge_and_explanation_queries_harmless
dtype: float64
- name: feat_3368_guidance_for_self_improvement_understanding_harmless
dtype: float64
- name: feat_3536_concise_information_extraction_harmless
dtype: float64
- name: feat_3767_explicit_content_boundary_requests_harmless
dtype: float64
- name: feat_3772_facilitating_misguidance_or_illicit_acts_harmful
dtype: float64
- name: feat_3868_request_for_public_address_harmless
dtype: float64
- name: feat_3895_direct_factual_retrieval_task_harmless
dtype: float64
- name: feat_3915_harmful_instructional_content_harmful
dtype: float64
- name: feat_4027_enumeration_and_list_prompts_harmless
dtype: float64
- name: feat_4112_serious_factual_inquiry_harmless
dtype: float64
- name: feat_4202_communication_&_information_security_harmless
dtype: float64
- name: feat_4205_personal_difficulties_and_coping_harmless
dtype: float64
- name: feat_4211_generating_group_based_discrimination_harmful
dtype: float64
- name: feat_4319_specific,_direct_inquiry_instruction_harmless
dtype: float64
- name: feat_4396_publicly_disclosed_personal_health_harmful
dtype: float64
- name: feat_4528_organizational_risk_communications_harmful
dtype: float64
- name: feat_4590_soliciting_discriminatory_narratives_harmless
dtype: float64
- name: feat_4644_specific_information_retrieval_harmless
dtype: float64
- name: feat_4991_conspiracy_theory_generation_promotion_harmful
dtype: float64
- name: feat_5449_request_for_email_subject_lines_harmful
dtype: float64
- name: feat_5505_email_subject_line_task_harmless
dtype: float64
- name: feat_5509_simple_linguistic_task_prompts_harmless
dtype: float64
- name: feat_573_specific_information_or_action_request_harmful
dtype: float64
- name: feat_6090_information_management_and_analysis_harmless
dtype: float64
- name: feat_6238_factual_definition_queries_harmless
dtype: float64
- name: feat_6469_academic_advocacy_content_task_harmless
dtype: float64
- name: feat_6499_constructive_guidance_seeking_harmless
dtype: float64
- name: feat_6562_direct_task_information_seeking_harmless
dtype: float64
- name: feat_6733_content_&_information_integrity_harmful
dtype: float64
- name: feat_6815_generating_influential_content_harmful
dtype: float64
- name: feat_7063_directive_explanations_&_creations_harmless
dtype: float64
- name: feat_7100_requests_for_specific_actionable_data_harmful
dtype: float64
- name: feat_7261_seeking_safety_and_protection_guidance_harmless
dtype: float64
- name: feat_7294_propagating_harmful_or_fraudulent_actions_harmful
dtype: float64
- name: feat_7480_email_subject_line_query_harmless
dtype: float64
- name: feat_7565_instructional_content_generation_harmless
dtype: float64
- name: feat_7575_explicit_information_requests_harmless
dtype: float64
- name: feat_7677_constrained_text_generation_instruction_harmless
dtype: float64
- name: feat_7897_action_oriented_practical_queries_harmless
dtype: float64
- name: feat_7940_strategic_deception_or_corporate_manipulation_harmful
dtype: float64
- name: feat_7971_factual_entity_information_queries_harmless
dtype: float64
- name: feat_8063_noteworthy_entities'_positive_ethical_aspects_harmless
dtype: float64
- name: feat_808_promoting_group_discrimination_by_traits_harmful
dtype: float64
- name: feat_8193_detailed_content_generation_requests_harmful
dtype: float64
- name: feat_823_instructions_for_mathematical_tasks_harmless
dtype: float64
- name: feat_8240_explicit_structured_information_tasks_harmless
dtype: float64
- name: feat_8643_neutral_framed_content_tasks_harmless
dtype: float64
- name: feat_9081_managing_sensitive_critical_information_harmful
dtype: float64
- name: feat_910_email_subject_line_task_harmless
dtype: float64
- name: feat_9267_contextually_harmless_fictional_scenarios_harmless
dtype: float64
- name: feat_9623_disinformation_for_manipulation_harmful
dtype: float64
- name: feat_9649_**soliciting_harmful_content_creation**_harmful
dtype: float64
- name: feat_9994_requests_for_n_item_lists_harmless
dtype: float64
splits:
- name: train
num_bytes: 7048606
num_examples: 5500
download_size: 4181665
dataset_size: 7048606
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
dataset_info:
features:
- 名称:提示词(prompt),数据类型:字符串
- 名称:Llama模型回复(llama_response),数据类型:字符串
- 名称:拒绝标识(refused),数据类型:64位整数
- 名称:提示词标签(prompt_label),数据类型:字符串
- 名称:来源(source),数据类型:字符串
- 名称:拒绝对数概率(refusal_logprob),数据类型:64位浮点数
- 名称:无害的直接询问与任务类特征(feat_10000_direct_inquiries_and_tasks_harmless),数据类型:64位浮点数
- 名称:无害的概念或技能解释类特征(feat_10737_explaining_concepts_or_skills_harmless),数据类型:64位浮点数
- 名称:无害的直接问题求解查询类特征(feat_1083_direct_problem_solving_queries_harmless),数据类型:64位浮点数
- 名称:有害的定向虚构内容生成类特征(feat_10878_directed_fictional_content_generation_harmful),数据类型:64位浮点数
- 名称:有害的负面群体刻板印象请求类特征(feat_10940_request_for_negative_group_stereotypes_harmful),数据类型:64位浮点数
- 名称:有害的敏感细节诉求类特征(feat_11007_requests_for_sensitive_details_harmful),数据类型:64位浮点数
- 名称:无害的正向隐喻攻击类特征(feat_11223_metaphorical_aggression_for_positive_goals_harmless),数据类型:64位浮点数
- 名称:有害的群体诋毁诱导类特征(feat_11236_prompting_group_denigration_harmful),数据类型:64位浮点数
- 名称:无害的事实与解释内容请求类特征(feat_11382_factual_and_explanatory_content_requests_harmless),数据类型:64位浮点数
- 名称:有害的规范性医疗建议类特征(feat_11430_harmful_prescriptive_medical_advice_harmful),数据类型:64位浮点数
- 名称:无害的特定信息内容请求类特征(feat_11634_specific_information_content_requests_harmless),数据类型:64位浮点数
- 名称:有害的特定指令提示类特征(feat_11761_specific_instructional_prompts_harmful),数据类型:64位浮点数
- 名称:无害的直接任务信息请求类特征(feat_11861_direct_task_information_request_harmless),数据类型:64位浮点数
- 名称:有害的结构化信息请求类特征(feat_12102_structured_information_request_harmful),数据类型:64位浮点数
- 名称:无害的特定信息流程请求类特征(feat_12232_specific_information_procedure_requests_harmless),数据类型:64位浮点数
- 名称:有害的有害框架或指令类特征(feat_1227_harmful_framing_or_instruction_harmful),数据类型:64位浮点数
- 名称:有害的潜在问题信息请求类特征(feat_1581_potentially_problematic_information_requests_harmful),数据类型:64位浮点数
- 名称:有害的敏感私人数据请求类特征(feat_1636_requesting_sensitive_private_data_harmful),数据类型:64位浮点数
- 名称:无害的面向任务的通信处理类特征(feat_1728_task_oriented_communication_processing_harmless),数据类型:64位浮点数
- 名称:无害的通用助手任务类特征(feat_1742_general_assistant_tasks_harmless),数据类型:64位浮点数
- 名称:无害的良性信息分析请求类特征(feat_1836_benign_informational_analytical_requests_harmless),数据类型:64位浮点数
- 名称:有害的开发策略请求类特征(feat_2197_requests_for_exploitation_tactics_harmful),数据类型:64位浮点数
- 名称:无害的需审视语言类特征(feat_2827_language_requiring_scrutiny_harmless),数据类型:64位浮点数
- 名称:无害的企业合同财务细节类特征(feat_286_corporate_contractual_financial_specifics_harmless),数据类型:64位浮点数
- 名称:无害的程序性技术任务类特征(feat_2945_harmless_procedural_technical_tasks_harmless),数据类型:64位浮点数
- 名称:无害的指令文本操作类特征(feat_3183_instructional_text_operations_harmless),数据类型:64位浮点数
- 名称:有害的制作欺骗性开发内容类特征(feat_3248_crafting_deceptive_exploitative_content_harmful),数据类型:64位浮点数
- 名称:无害的知识与解释查询类特征(feat_3277_knowledge_and_explanation_queries_harmless),数据类型:64位浮点数
- 名称:无害的自我提升理解指导类特征(feat_3368_guidance_for_self_improvement_understanding_harmless),数据类型:64位浮点数
- 名称:无害的简洁信息提取类特征(feat_3536_concise_information_extraction_harmless),数据类型:64位浮点数
- 名称:无害的明确内容边界请求类特征(feat_3767_explicit_content_boundary_requests_harmless),数据类型:64位浮点数
- 名称:有害的误导或非法行为协助类特征(feat_3772_facilitating_misguidance_or_illicit_acts_harmful),数据类型:64位浮点数
- 名称:无害的公开演讲请求类特征(feat_3868_request_for_public_address_harmless),数据类型:64位浮点数
- 名称:无害的直接事实检索任务类特征(feat_3895_direct_factual_retrieval_task_harmless),数据类型:64位浮点数
- 名称:有害的有害教学内容类特征(feat_3915_harmful_instructional_content_harmful),数据类型:64位浮点数
- 名称:无害的枚举与列表提示类特征(feat_4027_enumeration_and_list_prompts_harmless),数据类型:64位浮点数
- 名称:无害的严肃事实查询类特征(feat_4112_serious_factual_inquiry_harmless),数据类型:64位浮点数
- 名称:无害的通信与信息安全类特征(feat_4202_communication_&_information_security_harmless),数据类型:64位浮点数
- 名称:无害的个人困境与应对类特征(feat_4205_personal_difficulties_and_coping_harmless),数据类型:64位浮点数
- 名称:有害的基于群体的歧视生成类特征(feat_4211_generating_group_based_discrimination_harmful),数据类型:64位浮点数
- 名称:无害的特定直接查询指令类特征(feat_4319_specific,_direct_inquiry_instruction_harmless),数据类型:64位浮点数
- 名称:有害的公开披露个人健康信息类特征(feat_4396_publicly_disclosed_personal_health_harmful),数据类型:64位浮点数
- 名称:有害的组织风险沟通类特征(feat_4528_organizational_risk_communications_harmful),数据类型:64位浮点数
- 名称:无害的寻求歧视性叙事类特征(feat_4590_soliciting_discriminatory_narratives_harmless),数据类型:64位浮点数
- 名称:无害的特定信息检索类特征(feat_4644_specific_information_retrieval_harmless),数据类型:64位浮点数
- 名称:有害的阴谋论生成与宣扬类特征(feat_4991_conspiracy_theory_generation_promotion_harmful),数据类型:64位浮点数
- 名称:有害的邮件主题请求类特征(feat_5449_request_for_email_subject_lines_harmful),数据类型:64位浮点数
- 名称:无害的邮件主题行任务类特征(feat_5505_email_subject_line_task_harmless),数据类型:64位浮点数
- 名称:无害的简单语言任务提示类特征(feat_5509_simple_linguistic_task_prompts_harmless),数据类型:64位浮点数
- 名称:有害的特定信息或动作请求类特征(feat_573_specific_information_or_action_request_harmful),数据类型:64位浮点数
- 名称:无害的信息管理与分析类特征(feat_6090_information_management_and_analysis_harmless),数据类型:64位浮点数
- 名称:无害的事实定义查询类特征(feat_6238_factual_definition_queries_harmless),数据类型:64位浮点数
- 名称:无害的学术倡导内容任务类特征(feat_6469_academic_advocacy_content_task_harmless),数据类型:64位浮点数
- 名称:无害的建设性指导寻求类特征(feat_6499_constructive_guidance_seeking_harmless),数据类型:64位浮点数
- 名称:无害的直接任务信息寻求类特征(feat_6562_direct_task_information_seeking_harmless),数据类型:64位浮点数
- 名称:有害的内容与信息完整性类特征(feat_6733_content_&_information_integrity_harmful),数据类型:64位浮点数
- 名称:有害的有影响力内容生成类特征(feat_6815_generating_influential_content_harmful),数据类型:64位浮点数
- 名称:无害的指令性解释与创作类特征(feat_7063_directive_explanations_&_creations_harmless),数据类型:64位浮点数
- 名称:有害的特定可操作数据请求类特征(feat_7100_requests_for_specific_actionable_data_harmful),数据类型:64位浮点数
- 名称:无害的安全与保护指导寻求类特征(feat_7261_seeking_safety_and_protection_guidance_harmless),数据类型:64位浮点数
- 名称:有害的有害或欺诈行为传播类特征(feat_7294_propagating_harmful_or_fraudulent_actions_harmful),数据类型:64位浮点数
- 名称:无害的邮件主题行查询类特征(feat_7480_email_subject_line_query_harmless),数据类型:64位浮点数
- 名称:无害的教学内容生成类特征(feat_7565_instructional_content_generation_harmless),数据类型:64位浮点数
- 名称:无害的明确信息请求类特征(feat_7575_explicit_information_requests_harmless),数据类型:64位浮点数
- 名称:无害的约束文本生成指令类特征(feat_7677_constrained_text_generation_instruction_harmless),数据类型:64位浮点数
- 名称:无害的面向动作的实用查询类特征(feat_7897_action_oriented_practical_queries_harmless),数据类型:64位浮点数
- 名称:有害的战略欺骗或企业操纵类特征(feat_7940_strategic_deception_or_corporate_manipulation_harmful),数据类型:64位浮点数
- 名称:无害的事实实体信息查询类特征(feat_7971_factual_entity_information_queries_harmless),数据类型:64位浮点数
- 名称:无害的知名实体积极伦理方面类特征(feat_8063_noteworthy_entities'_positive_ethical_aspects_harmless),数据类型:64位浮点数
- 名称:有害的基于特质的群体歧视宣扬类特征(feat_808_promoting_group_discrimination_by_traits_harmful),数据类型:64位浮点数
- 名称:有害的详细内容生成请求类特征(feat_8193_detailed_content_generation_requests_harmful),数据类型:64位浮点数
- 名称:无害的数学任务指令类特征(feat_823_instructions_for_mathematical_tasks_harmless),数据类型:64位浮点数
- 名称:无害的显式结构化信息任务类特征(feat_8240_explicit_structured_information_tasks_harmless),数据类型:64位浮点数
- 名称:无害的中性框架内容任务类特征(feat_8643_neutral_framed_content_tasks_harmless),数据类型:64位浮点数
- 名称:有害的敏感关键信息管理类特征(feat_9081_managing_sensitive_critical_information_harmful),数据类型:64位浮点数
- 名称:无害的邮件主题行任务类特征(feat_910_email_subject_line_task_harmless),数据类型:64位浮点数
- 名称:无害的上下文无害虚构场景类特征(feat_9267_contextually_harmless_fictional_scenarios_harmless),数据类型:64位浮点数
- 名称:有害的操纵性虚假信息类特征(feat_9623_disinformation_for_manipulation_harmful),数据类型:64位浮点数
- 名称:有害的诱导有害内容创作类特征(feat_9649_soliciting_harmful_content_creation_harmful),数据类型:64位浮点数
- 名称:无害的N项列表请求类特征(feat_9994_requests_for_n_item_lists_harmless),数据类型:64位浮点数
splits:
- 名称:训练集(train),字节数:7048606,样本数:5500
download_size: 4181665
dataset_size: 7048606
configs:
- 配置名称:默认(default),数据文件:
- 拆分:训练集(train),路径:data/train-*
提供机构:
DarianNLP



