five

caiovicentino1/qwen35-a3b-thinking-traces

收藏
Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/caiovicentino1/qwen35-a3b-thinking-traces
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit tags: - mechanistic-interpretability - sparse-autoencoders - qwen3.5 - thinking-models size_categories: - 10K<n<100K --- # Qwen3.5-35B-A3B Thinking Traces — SAE Training Data Per-sentence L17 residual activations from Qwen/Qwen3.5-35B-A3B generating CoT on MMLU-Pro. ## Stats - Model: `Qwen/Qwen3.5-35B-A3B` - Layer: L17 residual (~42% depth of 40-layer hybrid MoE) - Prompts: 2000 from MMLU-Pro test - Sentences: 41285 - d_model: 2048 - Activation dtype: float16 ## Purpose Replication of Venhoff et al. 2025 (arXiv:2510.07364) "Base Models Know How to Reason, Thinking Models Learn When" applied to hybrid MoE+GDN+Gated-Attn architecture. Phase 1 of 3 (data generation). Next: tiny TopK SAE training (n=15, k=3) to cluster reasoning categories. ## Load ```python from safetensors.numpy import load_file import json from huggingface_hub import snapshot_download path = snapshot_download('caiovicentino1/qwen35-a3b-thinking-traces', repo_type='dataset') data = load_file(f'{path}/activations.safetensors') sentences = json.load(open(f'{path}/sentences.json')) ```
提供机构:
caiovicentino1
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作