shalanova/benchmark-3-arabic-gt

Name: shalanova/benchmark-3-arabic-gt
Creator: shalanova
Published: 2026-04-30 04:15:16
License: 暂无描述

Hugging Face2026-04-30 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/shalanova/benchmark-3-arabic-gt

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集源自JailbreakBench/JBB-Behaviors，包含异构的不安全类别（如有害指令、敏感话题、对抗性改写），并且提示不一定遵循典型的越狱模板。这种增加的多样性和分布变异性使得基于相似性的检测更具挑战性，并为跨语言迁移提供了压力测试。数据集包含200个提示（100个安全/100个不安全），列包括：text（原始提示）、label（0表示安全，1表示不安全）、translation（通过Google Translate翻译成阿拉伯语的提示）和score_ar_google（与codebook的余弦相似度分数）。更多信息可参考提供的论文链接。

The dataset is sourced from JailbreakBench/JBB-Behaviors and includes heterogeneous unsafe categories (e.g., harmful instructions, sensitive topics, adversarial rephrasings) with prompts that do not necessarily follow canonical jailbreak templates. This increased diversity and distributional variability makes similarity-based detection more challenging and provides a stress-test for cross-lingual transfer. The dataset contains 200 prompts (100 safe / 100 unsafe) with columns: text (original prompt), label (0: safe, 1: unsafe), translation (prompt translated to Arabic by Google Translate), and score_ar_google (cosine similarity score with codebook). More information is available in the provided paper link.

提供机构：

shalanova

5,000+

优质数据集

54 个

任务类型

进入经典数据集