shalanova/benchmark-4-chinese-gt
收藏Hugging Face2026-04-30 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/shalanova/benchmark-4-chinese-gt
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是通过Google Translate翻译成中文的,来源于nvidia/Aegis-AI-Content-Safety-Dataset-2.0。数据集涵盖异构的不安全类别,如有害指令、敏感话题和对抗性重述等,并包含不一定遵循典型越狱模板的提示。这种多样性和分布变异性增加了基于相似性检测的难度,并为跨语言迁移提供了压力测试。数据集包含1,000个提示(500个安全/500个不安全),数据列包括:text(原始提示)、label(0表示安全,1表示不安全)、translation(通过Google Translate翻译成中文的提示)和score_zh_google(与codebook的余弦相似度得分)。
This dataset is translated into Chinese by Google Translate, sourced from nvidia/Aegis-AI-Content-Safety-Dataset-2.0. The domain includes heterogeneous unsafe categories (e.g., harmful instructions, sensitive topics, adversarial rephrasings) and contains prompts that do not necessarily follow canonical jailbreak templates. This increased diversity and distributional variability makes similarity-based detection more challenging and provides a stress-test for cross-lingual transfer. The dataset size is 1,000 prompts (500 safe / 500 unsafe), with columns including: text (original prompt), label (0: safe, 1: unsafe), translation (prompt translated into Chinese by Google Translate), and score_zh_google (cosine similarity score with codebook).
提供机构:
shalanova



