Name: maximuspowers/muat-mean-std-large
Creator: maximuspowers
Published: 2025-12-06 08:45:48
License: 暂无描述

下载链接：

https://hf-mirror.com/datasets/maximuspowers/muat-mean-std-large

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: en task_categories: - text-generation --- # Subject Models for Interpretability Training These examples are intended for training an interpreter to: - Identify what patterns a model classifies as positive based on an activation signature, with examples of: trained model + signature → pattern identification. | Signature Extraction | | |----------------------|-----------------------------------------------------------------------------| | Neuron Profile Methods | mean, std | | Prompt Format | separate | | Signature Dataset | configs/dataset_gen/signature_dataset.json | | Model Architecture | | |----------------------|-----------------------------------------------------------------------------| | Number of Layers | 8 to 10 | | Neurons per Layer | 10 to 15 | | Activation Types | relu, gelu | | Pattern Vocab Size | 10 | | Pattern Sequence Len | 5 | | Training Datasets | | |----------------------|-----------------------------------------------------------------------------| | Enabled Patterns | palindrome, sorted_ascending, sorted_descending, alternating, contains_abc, starts_with, ends_with, no_repeats, has_majority, increasing_pairs, decreasing_pairs, vowel_consonant, first_last_match, mountain_pattern | | Patterns per Batch | 1-1 | | Pos/Neg Ratio | 1:1 | | Target Total Examples per Subject Model | 250 | | Staged Training | | |----------------------|-----------------------------------------------------------------------------| | Min Improvement Threshold | 0.05 (5.0%) | | Corruption Rate | 0.15 (15.0%) | ## Token Count Statistics | Task Type | Min Tokens | Max Tokens | Avg Tokens | |-----------|------------|------------|------------| | Classification | 7699 | 18864 | 12619.8 | ## Dataset Fields | Field | Description | |----------------------|-----------------------------------------------------------------------------| | example_id | Unique identifier for each example | | metadata | JSON string containing: | | | - `target_pattern`: The pattern that was corrupted during training | | | - `degraded_accuracy`: Accuracy of the model trained on corrupted data | | | - `improved_accuracy`: Accuracy of the model after training on clean data | | | - `improvement`: Delta between degraded and improved accuracy | | | - `model_config`: Subject model architecture and hyperparameters | | | - `corruption_stats`: Details about label corruption | | | - `selected_patterns`: All patterns in the subject model's training dataset | | | - `precision`: Model weight precision | | | - `quantization`: Quantization type applied to weights | | | - `config_signature`: Hash of critical config fields for validation | | classification_prompt | Input prompt with improved model weights and signature | | classification_completion | Target completion identifying the pattern | | classification_text | Full concatenated text (prompt + completion) |

应用场景：