scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m

Name: scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m
Creator: scaleinvariant
Published: 2026-03-12 18:08:02
License: 暂无描述

Hugging Face2026-03-12 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - feature-extraction tags: - llama - sparse-autoencoder - sae - interpretability - mechanistic-interpretability - activations - lmsys-chat - goodfire size_categories: - 1M<n<10M configs: - config_name: feature-activations data_files: - split: train path: feature-activations/*.parquet - config_name: prompt-level-features data_files: - split: train path: prompt-level-features/*.parquet - config_name: llm-explanations-input data_files: - split: train path: llm-explanations-input/*.parquet - config_name: llm-explanations-output default: true data_files: - split: train path: llm-explanations-output/*.parquet --- # SAE Feature Activations — Llama 3.1 8B Instruct, Layer 19 (LMSYS-Chat-1M) This dataset contains **Sparse Autoencoder (SAE) feature activations** extracted from layer 19 of [Meta's Llama 3.1 8B Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on conversations from [LMSYS-Chat-1M](https://huggingface.co/datasets/lmsys/lmsys-chat-1m). It also has natural language explainations of features generated by GPT OSS 120B. See subset 4 for details. The SAE used is [Goodfire/Llama-3.1-8B-Instruct-SAE-l19](https://huggingface.co/Goodfire/Llama-3.1-8B-Instruct-SAE-l19), which decomposes layer-19 residual stream activations into interpretable sparse features. ## Subsets This dataset has **four subsets** (configs), each serving a different purpose: ### 1. `feature-activations` — Sparse per-token SAE activations (2.3 GB) Every token where an SAE feature fired, stored as a sparse table. ```python from datasets import load_dataset ds = load_dataset("scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "feature-activations", split="train") ``` | Column | Type | Description | |--------|------|-------------| | `layer_id` | int32 | Always 19 | | `feature_id` | int32 | SAE feature index | | `activation_value` | float32 | Activation strength | | `prompt_id` | string | Links to the source prompt | | `token_position` | int32 | Token index in the sequence | **Use cases:** Find which features fire on specific tokens, analyze feature co-occurrence, compute feature statistics. ### 2. `prompt-level-features` — Per-prompt aggregated data (47.6 GB) Each row is one prompt with its raw activations, SAE decomposition, and active feature list. ```python ds = load_dataset("scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "prompt-level-features", split="train") ``` | Column | Type | Description | |--------|------|-------------| | `prompt_id` | string | Unique conversation ID | | `prompt` | string | Full conversation text (JSON array of messages) | | `sampled_positions` | list\<int32\> | Token positions that were sampled for analysis | | `activations_layer19` | binary | Raw layer-19 residual stream activations (compressed) | | `sae_representation` | binary | SAE feature decomposition (JSON, compressed) — list of [feature_id, activation_value] pairs per position | | `active_sae_feature_ids` | list\<int32\> | All SAE feature IDs active anywhere in this prompt | **Use cases:** Prompt-level feature analysis, clustering prompts by active features, studying which features characterize different conversation types. ### 3. `llm-explanations-input` — Top activating examples per feature (241 MB) The highest-activating prompts for each SAE feature, used as input for generating natural-language explanations. ```python ds = load_dataset("scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "llm-explanations-input", split="train") ``` | Column | Type | Description | |--------|------|-------------| | `feature_id` | int64 | SAE feature index | | `rank` | int64 | Rank among top activations (1 = highest) | | `prompt_text` | string | The prompt text where this feature activated strongly | | `activation_value` | float64 | Activation strength | | `token_position` | int64 | Token position of peak activation | **Use cases:** Understand what each SAE feature responds to by examining its top-activating examples. ### 4. `llm-explanations-output` — Natural-language feature explanations (6.8 MB) LLM-generated explanations describing what each SAE feature represents. Explainations were generated using gpt-oss:120b, with this prompt: ```python context_window = extract_token_window( prompt_text, token_position, tokenizer, window_size=200 ) formatted_examples.append( f"Example {i} (activation: {activation:.3f}, position: {token_position}):\n{context_window}\n" ) formatted_text = "\n".join(formatted_examples) prompt = f"""Looking at these examples where a neural network feature (Feature {feature_id}) activates strongly: {formatted_text} What pattern or concept does this feature detect? Be concise and specific. Respond with JSON containing: - "explanation": 2-3 sentence detailed explanation of the pattern - "concept": short label, 2-5 words""" ``` ```python ds = load_dataset("scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "llm-explanations-output", split="train") ``` | Column | Type | Description | |--------|------|-------------| | `feature_id` | int64 | SAE feature index | | `explanation` | string | Detailed natural-language explanation of the feature | | `concept` | string | Short concept label (e.g., "LLM-focused meta discussion") | **Use cases:** Look up what any SAE feature means, build feature dashboards, map features to human-interpretable concepts. ## Quick Start ```python from datasets import load_dataset # Load feature explanations to understand what features mean explanations = load_dataset( "scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "llm-explanations-output", split="train" ) # Find a feature by concept for row in explanations: if "code" in row["concept"].lower(): print(f"Feature {row['feature_id']}: {row['concept']}") print(f" {row['explanation'][:200]}...") break # Load sparse activations to find where that feature fires activations = load_dataset( "scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "feature-activations", split="train" ) # Filter for a specific feature feature_42 = activations.filter(lambda x: x["feature_id"] == 42) print(f"Feature 42 fired {len(feature_42)} times") ``` ## Details - **Base model**: `meta-llama/Llama-3.1-8B-Instruct` - **SAE model**: [Goodfire/Llama-3.1-8B-Instruct-SAE-l19](https://huggingface.co/Goodfire/Llama-3.1-8B-Instruct-SAE-l19) - **Layer**: 19 - **Source corpus**: LMSYS-Chat-1M (~100K prompts) - **Hidden dimension**: 4096 - **100 parquet shards** for feature-activations and prompt-level-features (10 workers × 10 batches) - **10 parquet shards** for llm-explanations (1 per worker) ## License This dataset is released under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/).

--- 许可证：知识共享署名4.0（CC BY 4.0）任务类别： - 特征提取（feature-extraction）标签： - LLaMA - 稀疏自编码器（Sparse Autoencoder, SAE） - SAE - 可解释性 - 机理可解释性 - 激活值 - LMSYS-Chat - Goodfire 数据规模区间： - 100万 < 样本数量 < 1000万配置项： - 配置名称：feature-activations 数据文件： - 拆分方式：训练集（train） - 文件路径：feature-activations/*.parquet - 配置名称：prompt-level-features 数据文件： - 拆分方式：训练集（train） - 文件路径：prompt-level-features/*.parquet - 配置名称：llm-explanations-input 数据文件： - 拆分方式：训练集（train） - 文件路径：llm-explanations-input/*.parquet - 配置名称：llm-explanations-output 默认配置：是数据文件： - 拆分方式：训练集（train） - 文件路径：llm-explanations-output/*.parquet --- # SAE特征激活值 —— LLaMA 3.1 8B Instruct 第19层（LMSYS-Chat-1M）本数据集包含**稀疏自编码器（Sparse Autoencoder, SAE）特征激活值**，从Meta发布的LLaMA 3.1 8B Instruct模型的第19层提取得到，语料来源为[LMSYS-Chat-1M](https://huggingface.co/datasets/lmsys/lmsys-chat-1m)的对话内容。数据集还包含由GPT OSS 120B生成的特征自然语言解释，详情请参见第4个子集。本数据集所使用的SAE模型为[Goodfire/Llama-3.1-8B-Instruct-SAE-l19](https://huggingface.co/Goodfire/Llama-3.1-8B-Instruct-SAE-l19)，该模型可将LLaMA 3.1 8B Instruct第19层的残差流激活值分解为可解释的稀疏特征。 ## 数据集子集本数据集包含**四个子集（即配置项）**，各子集用途各不相同： ### 1. `feature-activations` — 逐令牌稀疏SAE激活值（2.3 GB）存储所有触发了SAE特征的令牌，以稀疏表格式保存。 python from datasets import load_dataset ds = load_dataset("scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "feature-activations", split="train") | 列名 | 数据类型 | 说明 | |--------|------|-------------| | `layer_id` | int32 | 固定为19 | | `feature_id` | int32 | SAE特征索引 | | `activation_value` | float32 | 激活强度 | | `prompt_id` | string | 关联源对话的唯一标识 | | `token_position` | int32 | 序列中的令牌位置索引 | **适用场景：** 定位特定令牌触发的特征、分析特征共现关系、计算特征统计量。 ### 2. `prompt-level-features` — 单对话级聚合数据（47.6 GB）每一行对应一个对话，包含其原始激活值、SAE分解结果与激活特征列表。 python ds = load_dataset("scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "prompt-level-features", split="train") | 列名 | 数据类型 | 说明 | |--------|------|-------------| | `prompt_id` | string | 唯一对话标识 | | `prompt` | string | 完整对话文本（消息的JSON数组） | | `sampled_positions` | list<int32> | 用于分析的采样令牌位置 | | `activations_layer19` | 二进制 | 第19层残差流原始激活值（已压缩） | | `sae_representation` | 二进制 | SAE特征分解结果（JSON格式，已压缩）—— 按位置存储的[特征ID, 激活值]对列表 | | `active_sae_feature_ids` | list<int32> | 当前对话中所有被激活的SAE特征ID | **适用场景：** 单对话级特征分析、基于激活特征对对话进行聚类、研究不同对话类型对应的特征分布。 ### 3. `llm-explanations-input` — 单特征最高激活样本集（241 MB）存储每个SAE特征对应的最高激活对话，用于生成特征的自然语言解释。 python ds = load_dataset("scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "llm-explanations-input", split="train") | 列名 | 数据类型 | 说明 | |--------|------|-------------| | `feature_id` | int64 | SAE特征索引 | | `rank` | int64 | 激活强度排名（1为最高） | | `prompt_text` | string | 该特征产生强激活的对话文本 | | `activation_value` | float64 | 激活强度 | | `token_position` | int64 | 峰值激活对应的令牌位置 | **适用场景：** 通过查看特征的最高激活样本，理解该SAE特征的响应模式。 ### 4. `llm-explanations-output` — 特征自然语言解释集（6.8 MB）由大语言模型生成的、用于描述每个SAE特征含义的解释文本。这些解释由GPT OSS 120B生成，所用提示词如下： python context_window = extract_token_window( prompt_text, token_position, tokenizer, window_size=200 ) formatted_examples.append( f"Example {i} (activation: {activation:.3f}, position: {token_position}): {context_window} " ) formatted_text = " ".join(formatted_examples) prompt = f"""Looking at these examples where a neural network feature (Feature {feature_id}) activates strongly: {formatted_text} What pattern or concept does this feature detect? Be concise and specific. Respond with JSON containing: - "explanation": 2-3 sentence detailed explanation of the pattern - "concept": short label, 2-5 words""" python ds = load_dataset("scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "llm-explanations-output", split="train") | 列名 | 数据类型 | 说明 | |--------|------|-------------| | `feature_id` | int64 | SAE特征索引 | | `explanation` | string | 特征的详细自然语言解释 | | `concept` | string | 简短概念标签（例如："聚焦大语言模型的元讨论"） | **适用场景：** 查询任意SAE特征的含义、构建特征可视化面板、将特征映射为人类可理解的概念。 ## 快速上手 python from datasets import load_dataset # 加载特征解释集以明晰各特征语义 explanations = load_dataset( "scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "llm-explanations-output", split="train" ) # 按概念检索特征 for row in explanations: if "code" in row["concept"].lower(): print(f"Feature {row['feature_id']}: {row['concept']}") print(f" {row['explanation'][:200]}...") break # 加载稀疏激活值数据集以查询该特征的触发情况 activations = load_dataset( "scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "feature-activations", split="train" ) # 筛选指定特征的激活记录 feature_42 = activations.filter(lambda x: x["feature_id"] == 42) print(f"特征42共触发 {len(feature_42)} 次") ## 详细参数 - **基础模型**：`meta-llama/Llama-3.1-8B-Instruct` - **SAE模型**：[Goodfire/Llama-3.1-8B-Instruct-SAE-l19](https://huggingface.co/Goodfire/Llama-3.1-8B-Instruct-SAE-l19) - **目标层**：第19层 - **源语料**：LMSYS-Chat-1M（约10万条对话） - **隐藏维度**：4096 - **数据分片**：特征激活值与单对话级特征子集共100个Parquet分片（10个工作节点 × 10个批次） - **解释子集分片**：特征解释子集共10个Parquet分片（每个工作节点对应1个分片） ## 授权协议本数据集采用[知识共享署名4.0（CC BY 4.0）](https://creativecommons.org/licenses/by/4.0/)协议发布。

提供机构：

scaleinvariant

5,000+

优质数据集

54 个

任务类型

进入经典数据集