five

maximuspowers/muat-pca-10-medium

收藏
Hugging Face2025-12-06 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/maximuspowers/muat-pca-10-medium
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: en task_categories: - text-generation --- # Subject Models for Interpretability Training These examples are intended for training an interpreter to: - Identify what patterns a model classifies as positive based on an activation signature, with examples of: trained model + signature → pattern identification. | Signature Extraction | | |----------------------|-----------------------------------------------------------------------------| | Neuron Profile Methods | pca | | Prompt Format | separate | | Signature Dataset | configs/dataset_gen/signature_dataset.json | | Model Architecture | | |----------------------|-----------------------------------------------------------------------------| | Number of Layers | 8 to 10 | | Neurons per Layer | 10 to 15 | | Activation Types | relu, gelu | | Pattern Vocab Size | 10 | | Pattern Sequence Len | 5 | | Training Datasets | | |----------------------|-----------------------------------------------------------------------------| | Enabled Patterns | palindrome, sorted_ascending, sorted_descending, alternating, contains_abc, starts_with, ends_with, no_repeats, has_majority, increasing_pairs, decreasing_pairs, vowel_consonant, first_last_match, mountain_pattern | | Patterns per Batch | 1-1 | | Pos/Neg Ratio | 1:1 | | Target Total Examples per Subject Model | 250 | | Staged Training | | |----------------------|-----------------------------------------------------------------------------| | Min Improvement Threshold | 0.05 (5.0%) | | Corruption Rate | 0.15 (15.0%) | ## Token Count Statistics | Task Type | Min Tokens | Max Tokens | Avg Tokens | |-----------|------------|------------|------------| | Classification | 11581 | 26103 | 18025.0 | ## Dataset Fields | Field | Description | |----------------------|-----------------------------------------------------------------------------| | example_id | Unique identifier for each example | | metadata | JSON string containing: | | | - `target_pattern`: The pattern that was corrupted during training | | | - `degraded_accuracy`: Accuracy of the model trained on corrupted data | | | - `improved_accuracy`: Accuracy of the model after training on clean data | | | - `improvement`: Delta between degraded and improved accuracy | | | - `model_config`: Subject model architecture and hyperparameters | | | - `corruption_stats`: Details about label corruption | | | - `selected_patterns`: All patterns in the subject model's training dataset | | | - `precision`: Model weight precision | | | - `quantization`: Quantization type applied to weights | | | - `config_signature`: Hash of critical config fields for validation | | classification_prompt | Input prompt with improved model weights and signature | | classification_completion | Target completion identifying the pattern | | classification_text | Full concatenated text (prompt + completion) |

language: 英语 task_categories: - 文本生成(text-generation) # 用于可解释性训练的主题模型 本数据集示例旨在训练可解释性解析器,以完成以下任务: - 基于激活特征(activation signature)识别模型将哪些模式判定为正样本,配套示例流程为:已训练模型 + 激活特征 → 模式识别。 | 特征提取项 | 详情 | |----------------------|-----------------------------------------------------------------------------| | 神经元特征分析方法 | 主成分分析(pca,Principal Component Analysis) | | 提示格式 | 分离式(separate) | | 特征数据集路径 | configs/dataset_gen/signature_dataset.json | | 模型架构参数 | 详情 | |----------------------|-----------------------------------------------------------------------------| | 层数 | 8至10层 | | 每层神经元数量 | 10至15个 | | 激活函数类型 | ReLU(relu)、GELU(gelu) | | 模式词汇表规模 | 10 | | 模式序列长度 | 5 | | 训练数据集配置 | 详情 | |----------------------|-----------------------------------------------------------------------------| | 启用模式类型 | 回文(palindrome)、升序排序(sorted_ascending)、降序排序(sorted_descending)、交替序列(alternating)、包含abc子串(contains_abc)、以abc开头(starts_with)、以abc结尾(ends_with)、无重复字符(no_repeats)、存在多数字符(has_majority)、递增对序列(increasing_pairs)、递减对序列(decreasing_pairs)、元音辅音交替(vowel_consonant)、首尾字符匹配(first_last_match)、山峰型模式(mountain_pattern) | | 每批次模式数量 | 1-1(单批次仅包含1种模式) | | 正负样本比例 | 1:1 | | 单主题模型目标示例总数 | 250 | | 分阶段训练配置 | 详情 | |----------------------|-----------------------------------------------------------------------------| | 最小准确率提升阈值 | 0.05(即5.0%) | | 标签污染率 | 0.15(即15.0%) | ## Token(Token)计数统计 | 任务类型 | 最小Token数 | 最大Token数 | 平均Token数 | |-----------|------------|------------|------------| | 分类任务(Classification) | 11581 | 26103 | 18025.0 | ## 数据集字段说明 | 字段名 | 说明 | |----------------------|-----------------------------------------------------------------------------| | example_id | 每条示例的唯一标识符 | | metadata | JSON格式字符串,包含以下子字段:<br>- `target_pattern`: 训练阶段被污染的目标模式<br>- `degraded_accuracy`: 基于污染数据训练的模型的准确率<br>- `improved_accuracy`: 基于干净数据微调后的模型准确率<br>- `improvement`: 污染模型与干净模型的准确率差值<br>- `model_config`: 主题模型的架构与超参数<br>- `corruption_stats`: 标签污染详情<br>- `selected_patterns`: 主题模型训练数据集包含的全部模式<br>- `precision`: 模型权重的数值精度<br>- `quantization`: 应用于模型权重的量化类型<br>- `config_signature`: 用于配置验证的关键配置字段哈希值 | | classification_prompt | 包含优化后模型权重与激活特征的输入提示词 | | classification_completion | 用于识别目标模式的标准补全输出 | | classification_text | 提示词与补全结果拼接后的完整分类文本 |
提供机构:
maximuspowers
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作