five

AISafety-Student/little-steer

收藏
Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/AISafety-Student/little-steer
下载链接
链接失效反馈
官方服务:
资源简介:
Little Steer数据集是一个专注于AI安全与控制的数据集,包含模型对安全相关提示的响应,并在思维链上进行行为标注。该数据集旨在用于基于表示工程(RepE)的激活安全监测研究。提示来自公共安全相关数据集,响应由多种开源推理模型生成。每个推理轨迹由LLM法官使用自定义行为分类法进行标注,标签分为五类,并为每个句子分配安全分数。数据集模式详细,包括id、messages、annotations、model、judge、metadata、label_runs和safety_runs等字段。数据集采用CC BY 4.0许可,并标记为正在进行中的工作。

The Little Steer dataset is focused on AI safety and control, containing model responses to safety-relevant prompts with behavioral annotations over the chain-of-thought. It is designed for research on activation-based safety monitoring using Representation Engineering (RepE). Prompts are sourced from public safety-related datasets, and responses are generated from various open-weights reasoning models. Each reasoning trace is annotated by an LLM judge using a custom behavioral taxonomy, with labels grouped into five categories and a safety score assigned per sentence. The dataset schema is detailed, including fields like id, messages, annotations, model, judge, metadata, label_runs, and safety_runs. The dataset is licensed under CC BY 4.0 and is marked as a work in progress.
提供机构:
AISafety-Student
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作