Do_not_answer
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/LibrAI/longformer-harmful-ro
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个旨在评估大型语言模型安全机制的开源数据集,包含了那些负责任的 语言模型不应回答的提示。生成的回应经过了分析,以评估潜在的伤害。该数据集涵盖了五个风险领域和12种伤害类型,共包含939条指令,其任务是对语言模型的安全机制进行评估。
This open-source dataset is designed to evaluate the safety mechanisms of large language models (LLMs). It includes prompts that responsible language models should refuse to respond to, and the generated model responses are analyzed to assess potential harms. Covering five risk domains and 12 harm categories, this dataset contains 939 instructions in total, with its core task being the evaluation of language model safety mechanisms.
提供机构:
Open-source contributors



