five

ddidacus/guard-glp-data

收藏
Hugging Face2026-04-29 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/ddidacus/guard-glp-data
下载链接
链接失效反馈
官方服务:
资源简介:
guard-glp-data是一个用于训练和评估大型语言模型(LLM)安全防护措施的数据集,特别关注对抗性/越狱鲁棒性。数据集包含训练、校准和测试三个部分,分别有10,000、2,048和2,048个样本,每个部分中良性和恶意提示的比例均为1:1。每个样本包含两个字段:prompt(用户输入的文本)和adversarial(布尔值,表示是否为有害/越狱提示)。数据集来源于多个公开数据集,包括allenai/wildjailbreak、JailbreakBench artifacts、HarmEval和SG-Bench-malicious-instructions。数据集的构建方法详细描述了每个部分的采样策略和来源。该数据集适用于训练提示级安全分类器、校准防护系统的置信度阈值以及评估对抗性/越狱提示的鲁棒性。

guard-glp-data is a prompt-level safety dataset for training and evaluating LLM guardrails, with a focus on adversarial / jailbreak robustness. The dataset includes train, calibration, and test splits with sizes of 10,000, 2,048, and 2,048 samples respectively, each maintaining a 1:1 ratio of benign to malicious prompts. Each row contains two fields: prompt (the user-facing input text) and adversarial (a boolean indicating whether the prompt is harmful/jailbreak or benign). The dataset is sourced from multiple public datasets including allenai/wildjailbreak, JailbreakBench artifacts, HarmEval, and SG-Bench-malicious-instructions. The split construction details the sampling strategies and sources for each part. The dataset is intended for training prompt-level safety classifiers, calibrating confidence thresholds for guardrail systems, and evaluating robustness against adversarial / jailbreak prompts.
提供机构:
ddidacus
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作