scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m
收藏Hugging Face2026-03-12 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- feature-extraction
tags:
- llama
- sparse-autoencoder
- sae
- interpretability
- mechanistic-interpretability
- activations
- lmsys-chat
- goodfire
size_categories:
- 1M<n<10M
configs:
- config_name: feature-activations
data_files:
- split: train
path: feature-activations/*.parquet
- config_name: prompt-level-features
data_files:
- split: train
path: prompt-level-features/*.parquet
- config_name: llm-explanations-input
data_files:
- split: train
path: llm-explanations-input/*.parquet
- config_name: llm-explanations-output
default: true
data_files:
- split: train
path: llm-explanations-output/*.parquet
---
# SAE Feature Activations — Llama 3.1 8B Instruct, Layer 19 (LMSYS-Chat-1M)
This dataset contains **Sparse Autoencoder (SAE) feature activations** extracted from layer 19 of [Meta's Llama 3.1 8B Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on conversations from [LMSYS-Chat-1M](https://huggingface.co/datasets/lmsys/lmsys-chat-1m).
It also has natural language explainations of features generated by GPT OSS 120B. See subset 4 for details.
The SAE used is [Goodfire/Llama-3.1-8B-Instruct-SAE-l19](https://huggingface.co/Goodfire/Llama-3.1-8B-Instruct-SAE-l19), which decomposes layer-19 residual stream activations into interpretable sparse features.
## Subsets
This dataset has **four subsets** (configs), each serving a different purpose:
### 1. `feature-activations` — Sparse per-token SAE activations (2.3 GB)
Every token where an SAE feature fired, stored as a sparse table.
```python
from datasets import load_dataset
ds = load_dataset("scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "feature-activations", split="train")
```
| Column | Type | Description |
|--------|------|-------------|
| `layer_id` | int32 | Always 19 |
| `feature_id` | int32 | SAE feature index |
| `activation_value` | float32 | Activation strength |
| `prompt_id` | string | Links to the source prompt |
| `token_position` | int32 | Token index in the sequence |
**Use cases:** Find which features fire on specific tokens, analyze feature co-occurrence, compute feature statistics.
### 2. `prompt-level-features` — Per-prompt aggregated data (47.6 GB)
Each row is one prompt with its raw activations, SAE decomposition, and active feature list.
```python
ds = load_dataset("scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "prompt-level-features", split="train")
```
| Column | Type | Description |
|--------|------|-------------|
| `prompt_id` | string | Unique conversation ID |
| `prompt` | string | Full conversation text (JSON array of messages) |
| `sampled_positions` | list\<int32\> | Token positions that were sampled for analysis |
| `activations_layer19` | binary | Raw layer-19 residual stream activations (compressed) |
| `sae_representation` | binary | SAE feature decomposition (JSON, compressed) — list of [feature_id, activation_value] pairs per position |
| `active_sae_feature_ids` | list\<int32\> | All SAE feature IDs active anywhere in this prompt |
**Use cases:** Prompt-level feature analysis, clustering prompts by active features, studying which features characterize different conversation types.
### 3. `llm-explanations-input` — Top activating examples per feature (241 MB)
The highest-activating prompts for each SAE feature, used as input for generating natural-language explanations.
```python
ds = load_dataset("scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "llm-explanations-input", split="train")
```
| Column | Type | Description |
|--------|------|-------------|
| `feature_id` | int64 | SAE feature index |
| `rank` | int64 | Rank among top activations (1 = highest) |
| `prompt_text` | string | The prompt text where this feature activated strongly |
| `activation_value` | float64 | Activation strength |
| `token_position` | int64 | Token position of peak activation |
**Use cases:** Understand what each SAE feature responds to by examining its top-activating examples.
### 4. `llm-explanations-output` — Natural-language feature explanations (6.8 MB)
LLM-generated explanations describing what each SAE feature represents.
Explainations were generated using gpt-oss:120b, with this prompt:
```python
context_window = extract_token_window(
prompt_text,
token_position,
tokenizer,
window_size=200
)
formatted_examples.append(
f"Example {i} (activation: {activation:.3f}, position: {token_position}):\n{context_window}\n"
)
formatted_text = "\n".join(formatted_examples)
prompt = f"""Looking at these examples where a neural network feature (Feature {feature_id}) activates strongly:
{formatted_text}
What pattern or concept does this feature detect? Be concise and specific.
Respond with JSON containing:
- "explanation": 2-3 sentence detailed explanation of the pattern
- "concept": short label, 2-5 words"""
```
```python
ds = load_dataset("scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "llm-explanations-output", split="train")
```
| Column | Type | Description |
|--------|------|-------------|
| `feature_id` | int64 | SAE feature index |
| `explanation` | string | Detailed natural-language explanation of the feature |
| `concept` | string | Short concept label (e.g., "LLM-focused meta discussion") |
**Use cases:** Look up what any SAE feature means, build feature dashboards, map features to human-interpretable concepts.
## Quick Start
```python
from datasets import load_dataset
# Load feature explanations to understand what features mean
explanations = load_dataset(
"scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m",
"llm-explanations-output", split="train"
)
# Find a feature by concept
for row in explanations:
if "code" in row["concept"].lower():
print(f"Feature {row['feature_id']}: {row['concept']}")
print(f" {row['explanation'][:200]}...")
break
# Load sparse activations to find where that feature fires
activations = load_dataset(
"scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m",
"feature-activations", split="train"
)
# Filter for a specific feature
feature_42 = activations.filter(lambda x: x["feature_id"] == 42)
print(f"Feature 42 fired {len(feature_42)} times")
```
## Details
- **Base model**: `meta-llama/Llama-3.1-8B-Instruct`
- **SAE model**: [Goodfire/Llama-3.1-8B-Instruct-SAE-l19](https://huggingface.co/Goodfire/Llama-3.1-8B-Instruct-SAE-l19)
- **Layer**: 19
- **Source corpus**: LMSYS-Chat-1M (~100K prompts)
- **Hidden dimension**: 4096
- **100 parquet shards** for feature-activations and prompt-level-features (10 workers × 10 batches)
- **10 parquet shards** for llm-explanations (1 per worker)
## License
This dataset is released under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/).
---
许可证:知识共享署名4.0(CC BY 4.0)
任务类别:
- 特征提取(feature-extraction)
标签:
- LLaMA
- 稀疏自编码器(Sparse Autoencoder, SAE)
- SAE
- 可解释性
- 机理可解释性
- 激活值
- LMSYS-Chat
- Goodfire
数据规模区间:
- 100万 < 样本数量 < 1000万
配置项:
- 配置名称:feature-activations
数据文件:
- 拆分方式:训练集(train)
- 文件路径:feature-activations/*.parquet
- 配置名称:prompt-level-features
数据文件:
- 拆分方式:训练集(train)
- 文件路径:prompt-level-features/*.parquet
- 配置名称:llm-explanations-input
数据文件:
- 拆分方式:训练集(train)
- 文件路径:llm-explanations-input/*.parquet
- 配置名称:llm-explanations-output
默认配置:是
数据文件:
- 拆分方式:训练集(train)
- 文件路径:llm-explanations-output/*.parquet
---
# SAE特征激活值 —— LLaMA 3.1 8B Instruct 第19层(LMSYS-Chat-1M)
本数据集包含**稀疏自编码器(Sparse Autoencoder, SAE)特征激活值**,从Meta发布的LLaMA 3.1 8B Instruct模型的第19层提取得到,语料来源为[LMSYS-Chat-1M](https://huggingface.co/datasets/lmsys/lmsys-chat-1m)的对话内容。
数据集还包含由GPT OSS 120B生成的特征自然语言解释,详情请参见第4个子集。
本数据集所使用的SAE模型为[Goodfire/Llama-3.1-8B-Instruct-SAE-l19](https://huggingface.co/Goodfire/Llama-3.1-8B-Instruct-SAE-l19),该模型可将LLaMA 3.1 8B Instruct第19层的残差流激活值分解为可解释的稀疏特征。
## 数据集子集
本数据集包含**四个子集(即配置项)**,各子集用途各不相同:
### 1. `feature-activations` — 逐令牌稀疏SAE激活值(2.3 GB)
存储所有触发了SAE特征的令牌,以稀疏表格式保存。
python
from datasets import load_dataset
ds = load_dataset("scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "feature-activations", split="train")
| 列名 | 数据类型 | 说明 |
|--------|------|-------------|
| `layer_id` | int32 | 固定为19 |
| `feature_id` | int32 | SAE特征索引 |
| `activation_value` | float32 | 激活强度 |
| `prompt_id` | string | 关联源对话的唯一标识 |
| `token_position` | int32 | 序列中的令牌位置索引 |
**适用场景:** 定位特定令牌触发的特征、分析特征共现关系、计算特征统计量。
### 2. `prompt-level-features` — 单对话级聚合数据(47.6 GB)
每一行对应一个对话,包含其原始激活值、SAE分解结果与激活特征列表。
python
ds = load_dataset("scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "prompt-level-features", split="train")
| 列名 | 数据类型 | 说明 |
|--------|------|-------------|
| `prompt_id` | string | 唯一对话标识 |
| `prompt` | string | 完整对话文本(消息的JSON数组) |
| `sampled_positions` | list<int32> | 用于分析的采样令牌位置 |
| `activations_layer19` | 二进制 | 第19层残差流原始激活值(已压缩) |
| `sae_representation` | 二进制 | SAE特征分解结果(JSON格式,已压缩)—— 按位置存储的[特征ID, 激活值]对列表 |
| `active_sae_feature_ids` | list<int32> | 当前对话中所有被激活的SAE特征ID |
**适用场景:** 单对话级特征分析、基于激活特征对对话进行聚类、研究不同对话类型对应的特征分布。
### 3. `llm-explanations-input` — 单特征最高激活样本集(241 MB)
存储每个SAE特征对应的最高激活对话,用于生成特征的自然语言解释。
python
ds = load_dataset("scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "llm-explanations-input", split="train")
| 列名 | 数据类型 | 说明 |
|--------|------|-------------|
| `feature_id` | int64 | SAE特征索引 |
| `rank` | int64 | 激活强度排名(1为最高) |
| `prompt_text` | string | 该特征产生强激活的对话文本 |
| `activation_value` | float64 | 激活强度 |
| `token_position` | int64 | 峰值激活对应的令牌位置 |
**适用场景:** 通过查看特征的最高激活样本,理解该SAE特征的响应模式。
### 4. `llm-explanations-output` — 特征自然语言解释集(6.8 MB)
由大语言模型生成的、用于描述每个SAE特征含义的解释文本。
这些解释由GPT OSS 120B生成,所用提示词如下:
python
context_window = extract_token_window(
prompt_text,
token_position,
tokenizer,
window_size=200
)
formatted_examples.append(
f"Example {i} (activation: {activation:.3f}, position: {token_position}):
{context_window}
"
)
formatted_text = "
".join(formatted_examples)
prompt = f"""Looking at these examples where a neural network feature (Feature {feature_id}) activates strongly:
{formatted_text}
What pattern or concept does this feature detect? Be concise and specific.
Respond with JSON containing:
- "explanation": 2-3 sentence detailed explanation of the pattern
- "concept": short label, 2-5 words"""
python
ds = load_dataset("scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "llm-explanations-output", split="train")
| 列名 | 数据类型 | 说明 |
|--------|------|-------------|
| `feature_id` | int64 | SAE特征索引 |
| `explanation` | string | 特征的详细自然语言解释 |
| `concept` | string | 简短概念标签(例如:"聚焦大语言模型的元讨论") |
**适用场景:** 查询任意SAE特征的含义、构建特征可视化面板、将特征映射为人类可理解的概念。
## 快速上手
python
from datasets import load_dataset
# 加载特征解释集以明晰各特征语义
explanations = load_dataset(
"scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m",
"llm-explanations-output", split="train"
)
# 按概念检索特征
for row in explanations:
if "code" in row["concept"].lower():
print(f"Feature {row['feature_id']}: {row['concept']}")
print(f" {row['explanation'][:200]}...")
break
# 加载稀疏激活值数据集以查询该特征的触发情况
activations = load_dataset(
"scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m",
"feature-activations", split="train"
)
# 筛选指定特征的激活记录
feature_42 = activations.filter(lambda x: x["feature_id"] == 42)
print(f"特征42共触发 {len(feature_42)} 次")
## 详细参数
- **基础模型**:`meta-llama/Llama-3.1-8B-Instruct`
- **SAE模型**:[Goodfire/Llama-3.1-8B-Instruct-SAE-l19](https://huggingface.co/Goodfire/Llama-3.1-8B-Instruct-SAE-l19)
- **目标层**:第19层
- **源语料**:LMSYS-Chat-1M(约10万条对话)
- **隐藏维度**:4096
- **数据分片**:特征激活值与单对话级特征子集共100个Parquet分片(10个工作节点 × 10个批次)
- **解释子集分片**:特征解释子集共10个Parquet分片(每个工作节点对应1个分片)
## 授权协议
本数据集采用[知识共享署名4.0(CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/)协议发布。
提供机构:
scaleinvariant



