ByteDance-Seed/ReSA
收藏Hugging Face2026-03-13 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/ByteDance-Seed/ReSA
下载链接
链接失效反馈官方服务:
资源简介:
ReSA(推理安全对齐)是一个开源的合成安全训练数据集,包含80K个示例,旨在通过“先回答后检查”策略增强大型语言模型对越狱攻击的鲁棒性。该数据集教导模型首先生成其预期答案的摘要,然后批判性地评估其安全性,在提供最终响应之前。这种方法在保持强大的通用能力的同时,实现了卓越的安全性能,并减少了过度拒绝率。
ReSA (Reasoned Safety Alignment) is an open-source synthetic safety-training dataset with 80K examples designed to enhance LLM robustness against jailbreak attacks through an Answer-Then-Check strategy. The dataset teaches models to first generate a summary of their intended answer, then critically evaluate its safety before providing a final response. This approach achieves superior safety performance while maintaining strong general capabilities and reducing over-refusal rates.
提供机构:
ByteDance-Seed



