withmartian/SDS_train_mmlu-pro
收藏Hugging Face2026-03-04 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/withmartian/SDS_train_mmlu-pro
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: mit
tags:
- mechanistic-interpretability
- activations
- reasoning
- chain-of-thought
- switching-dynamical-systems
pretty_name: SDS Train MMLU-Pro
size_categories:
- 10K<n<100K
---
# SDS Train - MMLU-Pro
Activation extraction dataset for studying **Switching Dynamical Systems (SDS)** in reasoning LLMs, generated from the [TIGER-Lab/MMLU-Pro](https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro) benchmark (test split, ~4000 samples per model).
## Models
Reasoning (RLVR fine-tuned) models with their corresponding base models:
| Reasoning Model | Base Model | Layers Extracted |
|---|---|---|
| `deepseek-ai/DeepSeek-R1-Distill-Qwen-14B` | `Qwen/Qwen2.5-14B` | 28 (middle), 47 (final) |
| `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` | `Qwen/Qwen2.5-Math-1.5B` | 20 (middle), 27 (final) |
| `deepseek-ai/DeepSeek-R1-Distill-Llama-8B` | `meta-llama/Llama-3.1-8B` | 22 (middle), 31 (final) |
## Structure
```
<model>/<layer>/
raw_extractions.pkl # Per-problem CoT, sentences, hidden states
all_sentences_features.pkl # Flattened features (non-neutral stages only)
all_sentences_features_with_neutral.pkl # All features including NEUTRAL
cot_data.pkl # Problem text, CoT, and sentence splits
```
Currently contains reasoning model activations. Base model activations (same layers/samples) forthcoming.
## Reasoning Stage Classification
Each sentence in the CoT is classified into one of 8 stages using `Qwen/Qwen2.5-7B-Instruct`:
`PROBLEM_SETUP`, `FACT_RETRIEVAL`, `PLAN_GENERATION`, `UNCERTAINTY_MANAGEMENT`, `SELF_CHECKING`, `RESULT_CONSOLIDATION`, `ACTIVE_COMPUTATION`, `FINAL_ANSWER_EMISSION`
## Feature Format
Each entry in `all_sentences_features.pkl` contains:
- `hidden_state`: activation vector from the specified middle/final layer
- `hidden_state_last`: activation vector from the model's last layer
- `problem_id`: index into the dataset
- `sentence_idx`, `sentence`: the CoT sentence
- `stage`: classified reasoning stage
- `is_anchor`: True if stage is not NEUTRAL
## Generation
Generated using [withmartian/mi-cot](https://github.com/withmartian/mi-cot) (`mike/multigpu_createData` branch).
提供机构:
withmartian



