GuardAdvisor/GuardSet
收藏Hugging Face2025-10-06 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/GuardAdvisor/GuardSet
下载链接
链接失效反馈官方服务:
资源简介:
GuardSet是一个为支持Guardian-as-an-Advisor (GaaA)模型训练和评估而设计的大型多领域语料库。它使用软门控机制,不直接阻止用户请求,而是为原始查询添加风险标签和简洁解释,以指导下游大型语言模型生成更安全、更有用、更合规的响应。该数据集整合了多种有害和良性场景,特别针对鲁棒性和诚实性,以全面提高模型的可靠性。数据集包含200,314个训练样本,分为两个独立的部分,用于两阶段模型训练:监督微调(sft)和强化学习(rl)。
GuardSet is a large-scale, multi-domain corpus designed to support the training and evaluation of the Guardian-as-an-Advisor (GaaA) model. It employs a soft-gating mechanism that does not directly block user requests but predicts a risk label and a concise explanation to guide downstream Large Language Models (LLMs) to generate safer, more useful, and compliant responses. The dataset integrates a variety of harmful and benign scenarios, specifically targeting Robustness and Honesty to comprehensively improve model trustworthiness. It contains 200,314 training samples, divided into two independent splits for the two-stage Guardian model training: Supervised Fine-Tuning (sft) and Reinforcement Learning (rl).
提供机构:
GuardAdvisor



