Name: maximuspowers/muat-pca-10-medium
Creator: maximuspowers
Published: 2025-12-06 11:15:05
License: 暂无描述

下载链接：

https://hf-mirror.com/datasets/maximuspowers/muat-pca-10-medium

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: en task_categories: - text-generation --- # Subject Models for Interpretability Training These examples are intended for training an interpreter to: - Identify what patterns a model classifies as positive based on an activation signature, with examples of: trained model + signature → pattern identification. | Signature Extraction | | |----------------------|-----------------------------------------------------------------------------| | Neuron Profile Methods | pca | | Prompt Format | separate | | Signature Dataset | configs/dataset_gen/signature_dataset.json | | Model Architecture | | |----------------------|-----------------------------------------------------------------------------| | Number of Layers | 8 to 10 | | Neurons per Layer | 10 to 15 | | Activation Types | relu, gelu | | Pattern Vocab Size | 10 | | Pattern Sequence Len | 5 | | Training Datasets | | |----------------------|-----------------------------------------------------------------------------| | Enabled Patterns | palindrome, sorted_ascending, sorted_descending, alternating, contains_abc, starts_with, ends_with, no_repeats, has_majority, increasing_pairs, decreasing_pairs, vowel_consonant, first_last_match, mountain_pattern | | Patterns per Batch | 1-1 | | Pos/Neg Ratio | 1:1 | | Target Total Examples per Subject Model | 250 | | Staged Training | | |----------------------|-----------------------------------------------------------------------------| | Min Improvement Threshold | 0.05 (5.0%) | | Corruption Rate | 0.15 (15.0%) | ## Token Count Statistics | Task Type | Min Tokens | Max Tokens | Avg Tokens | |-----------|------------|------------|------------| | Classification | 11581 | 26103 | 18025.0 | ## Dataset Fields | Field | Description | |----------------------|-----------------------------------------------------------------------------| | example_id | Unique identifier for each example | | metadata | JSON string containing: | | | - `target_pattern`: The pattern that was corrupted during training | | | - `degraded_accuracy`: Accuracy of the model trained on corrupted data | | | - `improved_accuracy`: Accuracy of the model after training on clean data | | | - `improvement`: Delta between degraded and improved accuracy | | | - `model_config`: Subject model architecture and hyperparameters | | | - `corruption_stats`: Details about label corruption | | | - `selected_patterns`: All patterns in the subject model's training dataset | | | - `precision`: Model weight precision | | | - `quantization`: Quantization type applied to weights | | | - `config_signature`: Hash of critical config fields for validation | | classification_prompt | Input prompt with improved model weights and signature | | classification_completion | Target completion identifying the pattern | | classification_text | Full concatenated text (prompt + completion) |

language: 英语 task_categories: - 文本生成（text-generation） # 用于可解释性训练的主题模型本数据集示例旨在训练可解释性解析器，以完成以下任务： - 基于激活特征（activation signature）识别模型将哪些模式判定为正样本，配套示例流程为：已训练模型 + 激活特征 → 模式识别。 | 特征提取项 | 详情 | |----------------------|-----------------------------------------------------------------------------| | 神经元特征分析方法 | 主成分分析（pca，Principal Component Analysis） | | 提示格式 | 分离式（separate） | | 特征数据集路径 | configs/dataset_gen/signature_dataset.json | | 模型架构参数 | 详情 | |----------------------|-----------------------------------------------------------------------------| | 层数 | 8至10层 | | 每层神经元数量 | 10至15个 | | 激活函数类型 | ReLU（relu）、GELU（gelu） | | 模式词汇表规模 | 10 | | 模式序列长度 | 5 | | 训练数据集配置 | 详情 | |----------------------|-----------------------------------------------------------------------------| | 启用模式类型 | 回文（palindrome）、升序排序（sorted_ascending）、降序排序（sorted_descending）、交替序列（alternating）、包含abc子串（contains_abc）、以abc开头（starts_with）、以abc结尾（ends_with）、无重复字符（no_repeats）、存在多数字符（has_majority）、递增对序列（increasing_pairs）、递减对序列（decreasing_pairs）、元音辅音交替（vowel_consonant）、首尾字符匹配（first_last_match）、山峰型模式（mountain_pattern） | | 每批次模式数量 | 1-1（单批次仅包含1种模式） | | 正负样本比例 | 1:1 | | 单主题模型目标示例总数 | 250 | | 分阶段训练配置 | 详情 | |----------------------|-----------------------------------------------------------------------------| | 最小准确率提升阈值 | 0.05（即5.0%） | | 标签污染率 | 0.15（即15.0%） | ## Token（Token）计数统计 | 任务类型 | 最小Token数 | 最大Token数 | 平均Token数 | |-----------|------------|------------|------------| | 分类任务（Classification） | 11581 | 26103 | 18025.0 | ## 数据集字段说明 | 字段名 | 说明 | |----------------------|-----------------------------------------------------------------------------| | example_id | 每条示例的唯一标识符 | | metadata | JSON格式字符串，包含以下子字段： - `target_pattern`: 训练阶段被污染的目标模式 - `degraded_accuracy`: 基于污染数据训练的模型的准确率 - `improved_accuracy`: 基于干净数据微调后的模型准确率 - `improvement`: 污染模型与干净模型的准确率差值 - `model_config`: 主题模型的架构与超参数 - `corruption_stats`: 标签污染详情 - `selected_patterns`: 主题模型训练数据集包含的全部模式 - `precision`: 模型权重的数值精度 - `quantization`: 应用于模型权重的量化类型 - `config_signature`: 用于配置验证的关键配置字段哈希值 | | classification_prompt | 包含优化后模型权重与激活特征的输入提示词 | | classification_completion | 用于识别目标模式的标准补全输出 | | classification_text | 提示词与补全结果拼接后的完整分类文本 |

应用场景：