WangResearchLab/SteeringSafety
收藏Hugging Face2025-10-16 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/WangResearchLab/SteeringSafety
下载链接
链接失效反馈官方服务:
资源简介:
SteeringSafety 是一个用于评估表示引导方法在多个安全视角下效果的基准测试套件。它提供了17个数据集,包括7个视角来衡量安全行为,并实现了一个模块化代码框架,实现了训练无关引导方法的分类法,具有标准化、可互换的组件。每个数据集都分为40/10/50的训练/验证/测试部分。
SteeringSafety is a benchmark suite for evaluating representation steering methods across multiple safety perspectives. It provides a collection of 17 datasets including 7 perspectives for measuring safety behaviors, and implements a modular code framework with standardized, interchangeable components for training-free steering methods. Each dataset is split into 40/10/50 for train/val/test.
提供机构:
WangResearchLab



