AISafety-Student/little-steer

Name: AISafety-Student/little-steer
Creator: AISafety-Student
Published: 2026-04-22 01:16:39
License: 暂无描述

Hugging Face2026-04-22 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/AISafety-Student/little-steer

下载链接

链接失效反馈

官方服务：

资源简介：

Little Steer数据集是一个专注于AI安全与控制的数据集，包含模型对安全相关提示的响应，并在思维链上进行行为标注。该数据集旨在用于基于表示工程（RepE）的激活安全监测研究。提示来自公共安全相关数据集，响应由多种开源推理模型生成。每个推理轨迹由LLM法官使用自定义行为分类法进行标注，标签分为五类，并为每个句子分配安全分数。数据集模式详细，包括id、messages、annotations、model、judge、metadata、label_runs和safety_runs等字段。数据集采用CC BY 4.0许可，并标记为正在进行中的工作。

The Little Steer dataset is focused on AI safety and control, containing model responses to safety-relevant prompts with behavioral annotations over the chain-of-thought. It is designed for research on activation-based safety monitoring using Representation Engineering (RepE). Prompts are sourced from public safety-related datasets, and responses are generated from various open-weights reasoning models. Each reasoning trace is annotated by an LLM judge using a custom behavioral taxonomy, with labels grouped into five categories and a safety score assigned per sentence. The dataset schema is detailed, including fields like id, messages, annotations, model, judge, metadata, label_runs, and safety_runs. The dataset is licensed under CC BY 4.0 and is marked as a work in progress.

提供机构：

AISafety-Student

5,000+

优质数据集

54 个

任务类型

进入经典数据集