shalanova/benchmark-4-chinese-m2m
收藏Hugging Face2026-04-30 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/shalanova/benchmark-4-chinese-m2m
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含异构的不安全类别(如有害指令、敏感话题、对抗性重述),并包含不一定遵循典型越狱模板的提示。这种增加的多样性和分布变异性使得基于相似性的检测更具挑战性,并为跨语言迁移提供了压力测试。数据集大小为1,000个提示(500个安全/500个不安全),列包括:`text`(原始提示)、`label`(`0`:安全,`1`:不安全)、`translation`(由`facebook/m2m100_418M`翻译的中文提示)和`score_zh_model`(与[codebook](https://huggingface.co/datasets/shalanova/codebook_embeddings)的余弦相似度分数)。
Domain: include heterogeneous unsafe categories (e.g., harmful instructions, sensitive topics, adversarial rephrasings) and contain prompts that do not necessarily follow canonical jailbreak templates. This increased diversity and distributional variability makes similarity-based detection more challenging and provides a stress-test for cross-lingual transfer. Size: 1,000 prompts (500 safe / 500 unsafe) Columns: - `text` - original prompt - `label` - `0`: safe, `1`: unsafe - `translation` - prompt on Chinese translated by `facebook/m2m100_418M` - `score_zh_model` - cosine similarity score with [codebook](https://huggingface.co/datasets/shalanova/codebook_embeddings)
提供机构:
shalanova



