five

scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m

收藏
Hugging Face2026-03-12 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - feature-extraction tags: - llama - sparse-autoencoder - sae - interpretability - mechanistic-interpretability - activations - lmsys-chat - goodfire size_categories: - 1M<n<10M configs: - config_name: feature-activations data_files: - split: train path: feature-activations/*.parquet - config_name: prompt-level-features data_files: - split: train path: prompt-level-features/*.parquet - config_name: llm-explanations-input data_files: - split: train path: llm-explanations-input/*.parquet - config_name: llm-explanations-output default: true data_files: - split: train path: llm-explanations-output/*.parquet --- # SAE Feature Activations — Llama 3.1 8B Instruct, Layer 19 (LMSYS-Chat-1M) This dataset contains **Sparse Autoencoder (SAE) feature activations** extracted from layer 19 of [Meta's Llama 3.1 8B Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on conversations from [LMSYS-Chat-1M](https://huggingface.co/datasets/lmsys/lmsys-chat-1m). It also has natural language explainations of features generated by GPT OSS 120B. See subset 4 for details. The SAE used is [Goodfire/Llama-3.1-8B-Instruct-SAE-l19](https://huggingface.co/Goodfire/Llama-3.1-8B-Instruct-SAE-l19), which decomposes layer-19 residual stream activations into interpretable sparse features. ## Subsets This dataset has **four subsets** (configs), each serving a different purpose: ### 1. `feature-activations` — Sparse per-token SAE activations (2.3 GB) Every token where an SAE feature fired, stored as a sparse table. ```python from datasets import load_dataset ds = load_dataset("scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "feature-activations", split="train") ``` | Column | Type | Description | |--------|------|-------------| | `layer_id` | int32 | Always 19 | | `feature_id` | int32 | SAE feature index | | `activation_value` | float32 | Activation strength | | `prompt_id` | string | Links to the source prompt | | `token_position` | int32 | Token index in the sequence | **Use cases:** Find which features fire on specific tokens, analyze feature co-occurrence, compute feature statistics. ### 2. `prompt-level-features` — Per-prompt aggregated data (47.6 GB) Each row is one prompt with its raw activations, SAE decomposition, and active feature list. ```python ds = load_dataset("scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "prompt-level-features", split="train") ``` | Column | Type | Description | |--------|------|-------------| | `prompt_id` | string | Unique conversation ID | | `prompt` | string | Full conversation text (JSON array of messages) | | `sampled_positions` | list\<int32\> | Token positions that were sampled for analysis | | `activations_layer19` | binary | Raw layer-19 residual stream activations (compressed) | | `sae_representation` | binary | SAE feature decomposition (JSON, compressed) — list of [feature_id, activation_value] pairs per position | | `active_sae_feature_ids` | list\<int32\> | All SAE feature IDs active anywhere in this prompt | **Use cases:** Prompt-level feature analysis, clustering prompts by active features, studying which features characterize different conversation types. ### 3. `llm-explanations-input` — Top activating examples per feature (241 MB) The highest-activating prompts for each SAE feature, used as input for generating natural-language explanations. ```python ds = load_dataset("scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "llm-explanations-input", split="train") ``` | Column | Type | Description | |--------|------|-------------| | `feature_id` | int64 | SAE feature index | | `rank` | int64 | Rank among top activations (1 = highest) | | `prompt_text` | string | The prompt text where this feature activated strongly | | `activation_value` | float64 | Activation strength | | `token_position` | int64 | Token position of peak activation | **Use cases:** Understand what each SAE feature responds to by examining its top-activating examples. ### 4. `llm-explanations-output` — Natural-language feature explanations (6.8 MB) LLM-generated explanations describing what each SAE feature represents. Explainations were generated using gpt-oss:120b, with this prompt: ```python context_window = extract_token_window( prompt_text, token_position, tokenizer, window_size=200 ) formatted_examples.append( f"Example {i} (activation: {activation:.3f}, position: {token_position}):\n{context_window}\n" ) formatted_text = "\n".join(formatted_examples) prompt = f"""Looking at these examples where a neural network feature (Feature {feature_id}) activates strongly: {formatted_text} What pattern or concept does this feature detect? Be concise and specific. Respond with JSON containing: - "explanation": 2-3 sentence detailed explanation of the pattern - "concept": short label, 2-5 words""" ``` ```python ds = load_dataset("scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "llm-explanations-output", split="train") ``` | Column | Type | Description | |--------|------|-------------| | `feature_id` | int64 | SAE feature index | | `explanation` | string | Detailed natural-language explanation of the feature | | `concept` | string | Short concept label (e.g., "LLM-focused meta discussion") | **Use cases:** Look up what any SAE feature means, build feature dashboards, map features to human-interpretable concepts. ## Quick Start ```python from datasets import load_dataset # Load feature explanations to understand what features mean explanations = load_dataset( "scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "llm-explanations-output", split="train" ) # Find a feature by concept for row in explanations: if "code" in row["concept"].lower(): print(f"Feature {row['feature_id']}: {row['concept']}") print(f" {row['explanation'][:200]}...") break # Load sparse activations to find where that feature fires activations = load_dataset( "scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "feature-activations", split="train" ) # Filter for a specific feature feature_42 = activations.filter(lambda x: x["feature_id"] == 42) print(f"Feature 42 fired {len(feature_42)} times") ``` ## Details - **Base model**: `meta-llama/Llama-3.1-8B-Instruct` - **SAE model**: [Goodfire/Llama-3.1-8B-Instruct-SAE-l19](https://huggingface.co/Goodfire/Llama-3.1-8B-Instruct-SAE-l19) - **Layer**: 19 - **Source corpus**: LMSYS-Chat-1M (~100K prompts) - **Hidden dimension**: 4096 - **100 parquet shards** for feature-activations and prompt-level-features (10 workers × 10 batches) - **10 parquet shards** for llm-explanations (1 per worker) ## License This dataset is released under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/).

--- 许可证:知识共享署名4.0(CC BY 4.0) 任务类别: - 特征提取(feature-extraction) 标签: - LLaMA - 稀疏自编码器(Sparse Autoencoder, SAE) - SAE - 可解释性 - 机理可解释性 - 激活值 - LMSYS-Chat - Goodfire 数据规模区间: - 100万 < 样本数量 < 1000万 配置项: - 配置名称:feature-activations 数据文件: - 拆分方式:训练集(train) - 文件路径:feature-activations/*.parquet - 配置名称:prompt-level-features 数据文件: - 拆分方式:训练集(train) - 文件路径:prompt-level-features/*.parquet - 配置名称:llm-explanations-input 数据文件: - 拆分方式:训练集(train) - 文件路径:llm-explanations-input/*.parquet - 配置名称:llm-explanations-output 默认配置:是 数据文件: - 拆分方式:训练集(train) - 文件路径:llm-explanations-output/*.parquet --- # SAE特征激活值 —— LLaMA 3.1 8B Instruct 第19层(LMSYS-Chat-1M) 本数据集包含**稀疏自编码器(Sparse Autoencoder, SAE)特征激活值**,从Meta发布的LLaMA 3.1 8B Instruct模型的第19层提取得到,语料来源为[LMSYS-Chat-1M](https://huggingface.co/datasets/lmsys/lmsys-chat-1m)的对话内容。 数据集还包含由GPT OSS 120B生成的特征自然语言解释,详情请参见第4个子集。 本数据集所使用的SAE模型为[Goodfire/Llama-3.1-8B-Instruct-SAE-l19](https://huggingface.co/Goodfire/Llama-3.1-8B-Instruct-SAE-l19),该模型可将LLaMA 3.1 8B Instruct第19层的残差流激活值分解为可解释的稀疏特征。 ## 数据集子集 本数据集包含**四个子集(即配置项)**,各子集用途各不相同: ### 1. `feature-activations` — 逐令牌稀疏SAE激活值(2.3 GB) 存储所有触发了SAE特征的令牌,以稀疏表格式保存。 python from datasets import load_dataset ds = load_dataset("scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "feature-activations", split="train") | 列名 | 数据类型 | 说明 | |--------|------|-------------| | `layer_id` | int32 | 固定为19 | | `feature_id` | int32 | SAE特征索引 | | `activation_value` | float32 | 激活强度 | | `prompt_id` | string | 关联源对话的唯一标识 | | `token_position` | int32 | 序列中的令牌位置索引 | **适用场景:** 定位特定令牌触发的特征、分析特征共现关系、计算特征统计量。 ### 2. `prompt-level-features` — 单对话级聚合数据(47.6 GB) 每一行对应一个对话,包含其原始激活值、SAE分解结果与激活特征列表。 python ds = load_dataset("scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "prompt-level-features", split="train") | 列名 | 数据类型 | 说明 | |--------|------|-------------| | `prompt_id` | string | 唯一对话标识 | | `prompt` | string | 完整对话文本(消息的JSON数组) | | `sampled_positions` | list<int32> | 用于分析的采样令牌位置 | | `activations_layer19` | 二进制 | 第19层残差流原始激活值(已压缩) | | `sae_representation` | 二进制 | SAE特征分解结果(JSON格式,已压缩)—— 按位置存储的[特征ID, 激活值]对列表 | | `active_sae_feature_ids` | list<int32> | 当前对话中所有被激活的SAE特征ID | **适用场景:** 单对话级特征分析、基于激活特征对对话进行聚类、研究不同对话类型对应的特征分布。 ### 3. `llm-explanations-input` — 单特征最高激活样本集(241 MB) 存储每个SAE特征对应的最高激活对话,用于生成特征的自然语言解释。 python ds = load_dataset("scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "llm-explanations-input", split="train") | 列名 | 数据类型 | 说明 | |--------|------|-------------| | `feature_id` | int64 | SAE特征索引 | | `rank` | int64 | 激活强度排名(1为最高) | | `prompt_text` | string | 该特征产生强激活的对话文本 | | `activation_value` | float64 | 激活强度 | | `token_position` | int64 | 峰值激活对应的令牌位置 | **适用场景:** 通过查看特征的最高激活样本,理解该SAE特征的响应模式。 ### 4. `llm-explanations-output` — 特征自然语言解释集(6.8 MB) 由大语言模型生成的、用于描述每个SAE特征含义的解释文本。 这些解释由GPT OSS 120B生成,所用提示词如下: python context_window = extract_token_window( prompt_text, token_position, tokenizer, window_size=200 ) formatted_examples.append( f"Example {i} (activation: {activation:.3f}, position: {token_position}): {context_window} " ) formatted_text = " ".join(formatted_examples) prompt = f"""Looking at these examples where a neural network feature (Feature {feature_id}) activates strongly: {formatted_text} What pattern or concept does this feature detect? Be concise and specific. Respond with JSON containing: - "explanation": 2-3 sentence detailed explanation of the pattern - "concept": short label, 2-5 words""" python ds = load_dataset("scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "llm-explanations-output", split="train") | 列名 | 数据类型 | 说明 | |--------|------|-------------| | `feature_id` | int64 | SAE特征索引 | | `explanation` | string | 特征的详细自然语言解释 | | `concept` | string | 简短概念标签(例如:"聚焦大语言模型的元讨论") | **适用场景:** 查询任意SAE特征的含义、构建特征可视化面板、将特征映射为人类可理解的概念。 ## 快速上手 python from datasets import load_dataset # 加载特征解释集以明晰各特征语义 explanations = load_dataset( "scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "llm-explanations-output", split="train" ) # 按概念检索特征 for row in explanations: if "code" in row["concept"].lower(): print(f"Feature {row['feature_id']}: {row['concept']}") print(f" {row['explanation'][:200]}...") break # 加载稀疏激活值数据集以查询该特征的触发情况 activations = load_dataset( "scaleinvariant/sae-activations-llama-3.1-8b-layer19-lmsys-chat-1m", "feature-activations", split="train" ) # 筛选指定特征的激活记录 feature_42 = activations.filter(lambda x: x["feature_id"] == 42) print(f"特征42共触发 {len(feature_42)} 次") ## 详细参数 - **基础模型**:`meta-llama/Llama-3.1-8B-Instruct` - **SAE模型**:[Goodfire/Llama-3.1-8B-Instruct-SAE-l19](https://huggingface.co/Goodfire/Llama-3.1-8B-Instruct-SAE-l19) - **目标层**:第19层 - **源语料**:LMSYS-Chat-1M(约10万条对话) - **隐藏维度**:4096 - **数据分片**:特征激活值与单对话级特征子集共100个Parquet分片(10个工作节点 × 10个批次) - **解释子集分片**:特征解释子集共10个Parquet分片(每个工作节点对应1个分片) ## 授权协议 本数据集采用[知识共享署名4.0(CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/)协议发布。
提供机构:
scaleinvariant
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作