shalanova/benchmark-3-chinese-gt
收藏Hugging Face2026-04-30 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/shalanova/benchmark-3-chinese-gt
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个关于安全和不安全提示的数据集,包含200个提示(100个安全/100个不安全)。数据集的特点是包含异构的不安全类别(如有害指令、敏感话题、对抗性重述),并且提示不一定遵循典型的越狱模板。这种多样性和分布变异性使得基于相似性的检测更具挑战性,并为跨语言迁移提供了压力测试。数据集包含四个列:text(原始提示)、label(0表示安全,1表示不安全)、translation(通过Google Translate翻译成中文的提示)和score_zh_google(与codebook的余弦相似度得分)。
This dataset is a collection of safe and unsafe prompts, containing 200 prompts (100 safe / 100 unsafe). The dataset features heterogeneous unsafe categories (e.g., harmful instructions, sensitive topics, adversarial rephrasings) and contains prompts that do not necessarily follow canonical jailbreak templates. This increased diversity and distributional variability makes similarity-based detection more challenging and provides a stress-test for cross-lingual transfer. The dataset includes four columns: text (original prompt), label (0: safe, 1: unsafe), translation (prompt translated into Chinese by Google Translate), and score_zh_google (cosine similarity score with codebook).
提供机构:
shalanova



