caiovicentino1/qwen35-a3b-thinking-traces

Name: caiovicentino1/qwen35-a3b-thinking-traces
Creator: caiovicentino1
Published: 2026-04-21 10:46:41
License: 暂无描述

Hugging Face2026-04-21 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/caiovicentino1/qwen35-a3b-thinking-traces

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit tags: - mechanistic-interpretability - sparse-autoencoders - qwen3.5 - thinking-models size_categories: - 10K<n<100K --- # Qwen3.5-35B-A3B Thinking Traces — SAE Training Data Per-sentence L17 residual activations from Qwen/Qwen3.5-35B-A3B generating CoT on MMLU-Pro. ## Stats - Model: `Qwen/Qwen3.5-35B-A3B` - Layer: L17 residual (~42% depth of 40-layer hybrid MoE) - Prompts: 2000 from MMLU-Pro test - Sentences: 41285 - d_model: 2048 - Activation dtype: float16 ## Purpose Replication of Venhoff et al. 2025 (arXiv:2510.07364) "Base Models Know How to Reason, Thinking Models Learn When" applied to hybrid MoE+GDN+Gated-Attn architecture. Phase 1 of 3 (data generation). Next: tiny TopK SAE training (n=15, k=3) to cluster reasoning categories. ## Load ```python from safetensors.numpy import load_file import json from huggingface_hub import snapshot_download path = snapshot_download('caiovicentino1/qwen35-a3b-thinking-traces', repo_type='dataset') data = load_file(f'{path}/activations.safetensors') sentences = json.load(open(f'{path}/sentences.json')) ```

提供机构：

caiovicentino1

5,000+

优质数据集

54 个

任务类型

进入经典数据集