ddidacus/guard-glp-data

Name: ddidacus/guard-glp-data
Creator: ddidacus
Published: 2026-04-29 22:09:21
License: 暂无描述

Hugging Face2026-04-29 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/ddidacus/guard-glp-data

下载链接

链接失效反馈

官方服务：

资源简介：

guard-glp-data是一个用于训练和评估大型语言模型（LLM）安全防护措施的数据集，特别关注对抗性/越狱鲁棒性。数据集包含训练、校准和测试三个部分，分别有10,000、2,048和2,048个样本，每个部分中良性和恶意提示的比例均为1:1。每个样本包含两个字段：prompt（用户输入的文本）和adversarial（布尔值，表示是否为有害/越狱提示）。数据集来源于多个公开数据集，包括allenai/wildjailbreak、JailbreakBench artifacts、HarmEval和SG-Bench-malicious-instructions。数据集的构建方法详细描述了每个部分的采样策略和来源。该数据集适用于训练提示级安全分类器、校准防护系统的置信度阈值以及评估对抗性/越狱提示的鲁棒性。

guard-glp-data is a prompt-level safety dataset for training and evaluating LLM guardrails, with a focus on adversarial / jailbreak robustness. The dataset includes train, calibration, and test splits with sizes of 10,000, 2,048, and 2,048 samples respectively, each maintaining a 1:1 ratio of benign to malicious prompts. Each row contains two fields: prompt (the user-facing input text) and adversarial (a boolean indicating whether the prompt is harmful/jailbreak or benign). The dataset is sourced from multiple public datasets including allenai/wildjailbreak, JailbreakBench artifacts, HarmEval, and SG-Bench-malicious-instructions. The split construction details the sampling strategies and sources for each part. The dataset is intended for training prompt-level safety classifiers, calibrating confidence thresholds for guardrail systems, and evaluating robustness against adversarial / jailbreak prompts.

提供机构：

ddidacus

5,000+

优质数据集

54 个

任务类型

进入经典数据集