jaygala24/reasoning-models-interpretability-artifacts

Name: jaygala24/reasoning-models-interpretability-artifacts
Creator: jaygala24
Published: 2026-04-26 05:27:42
License: 暂无描述

Hugging Face2026-04-26 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/jaygala24/reasoning-models-interpretability-artifacts

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含用于研究开放权重语言模型中推理轨迹的中间产物，具体包括带注释的轨迹隐藏表示和基于推理步骤类别计算的谱度量。这些产物旨在用于分析和共享，而非作为表格数据集直接通过`datasets.load_dataset(...)`加载。数据集涵盖多个模型，如Olmo和Qwen系列，并提供隐藏状态的张量表示和谱度量（如RankMe和alpha-ReQ）的详细分析，支持四种步池化策略（全部令牌、平均池化、首令牌、末令牌）。

--- 数据集名称：推理模型可解释性数据集工件语言： - 英语标签： - 推理（reasoning） - 可解释性（interpretability） - 隐层状态（hidden-states） - 频谱分析（spectral-analysis） - Transformer（Transformer） - safetensors（safetensors）样本规模分类： - 1K<n<10K 授权协议：其他 --- # 推理模型可解释性数据集工件本数据集包含用于研究开源权重语言模型推理轨迹的中间人工制品，涵盖带注释的轨迹隐层表示，以及基于推理步骤类别计算得到的频谱指标。本数据集工件仅用于分析与共享，不可直接通过`datasets.load_dataset(...)`以表格数据集形式加载。 ## 数据集内容 text annotated_traces_reprs/ <模型>/ config.json index.json hidden_states_layer<层>_shard*.safetensors extraction_*.log spectral_metrics/ <模型>/ pool_all/ basic_metrics.json depth_profile.json token_count_sweep.json svd_vs_covariance.json pool_mean/ ... pool_first/ ... pool_last/ ... ## 所用模型本数据集提供了以下模型的对应工件： | 目录名 | 模型全称 | |---|---| | `olmo-3-7b-think` | `allenai/Olmo-3-7B-Think` | | `olmo-3-7b-think-sft` | `allenai/Olmo-3-7B-Think-SFT` | | `olmo-3-7b-think-dpo` | `allenai/Olmo-3-7B-Think-DPO` | | `qwen3-4b-thinking-2507` | `Qwen/Qwen3-4B-Thinking-2507` | | `qwen3-4b-instruct-2507` | `Qwen/Qwen3-4B-Instruct-2507` | ## 表示格式每个`annotated_traces_reprs/<模型>/`目录包含以下文件： - `config.json`：包含模型名称、保存的层索引、隐层维度、存储数据类型、Token（Token）数量与分片元数据。 - `index.json`：实现带注释的样本/步骤与隐层状态张量全局Token范围的映射关系。 - `hidden_states_layer*_shard*.safetensors`：目标层的分片隐层状态，以`bfloat16`格式存储，张量键名为`hidden_states`。这些张量通过`index.json`与带注释的推理步骤对齐。对于包含`(global_start, global_end)`的步骤，可从拼接后的分片隐层状态矩阵中切片获取对应行范围。 ## 频谱指标 `spectral_metrics/`目录包含由`compute_spectral_metrics.py`生成的统一输出结构，具体文件如下： - `basic_metrics.json`：全局、思维/解决方案分段以及宏观推理类别下的RankMe与alpha-ReQ指标。 - `depth_profile.json`：按相对推理轨迹深度分箱后的频谱指标。 - `token_count_sweep.json`：匹配Token数量下的类别/全局频谱指标。 - `svd_vs_covariance.json`：基于中心化协方差、中心化奇异值分解（SVD）与非中心化SVD计算的有效秩对比结果。每个模型采用四种步骤池化策略进行评估： - `pool_all`：每个带注释步骤中的全部Token。 - `pool_mean`：对步骤表示进行均值池化。 - `pool_first`：取每个步骤的首个Token。 - `pool_last`：取每个步骤的末尾Token。 ## 下载示例仅下载小型频谱指标文件： bash huggingface-cli download jaygala24/reasoning-models-interpretability-artifacts --repo-type dataset --include "spectral_metrics/**" --local-dir ./reasoning-models-interpretability-artifacts 下载单个模型的隐层表示： bash huggingface-cli download jaygala24/reasoning-models-interpretability-artifacts --repo-type dataset --include "annotated_traces_reprs/olmo-3-7b-think/**" --local-dir ./reasoning-models-interpretability-artifacts ## 隐层状态切片加载示例 python import json from pathlib import Path from safetensors import safe_open model_dir = Path("reasoning-models-interpretability-artifacts/annotated_traces_reprs/olmo-3-7b-think") with open(model_dir / "config.json") as f: config = json.load(f) with open(model_dir / "index.json") as f: index = json.load(f) sample = index["samples"][0] step = sample["steps"][0] global_start = sample["global_offset"] + step["token_start"] global_end = sample["global_offset"] + step["token_end"] for shard in config["shards"]: if global_start < shard["token_end"] and global_end > shard["token_start"]: local_start = max(global_start, shard["token_start"]) - shard["token_start"] local_end = min(global_end, shard["token_end"]) - shard["token_start"] with safe_open(model_dir / shard["file"], framework="pt") as f: hidden = f.get_slice("hidden_states")[local_start:local_end] break print(hidden.shape) ## 数据集溯源本数据集工件源自OpenThoughts风格问题生成的推理轨迹，经标注为推理步骤类别后，使用各模型的最终保存Transformer（Transformer）层进行表示计算。频谱指标基于这些表示，通过RankMe与alpha-ReQ方法计算得到。有关生成与分析这些工件的脚本与Jupyter Notebook，请参阅源代码仓库。

提供机构：

jaygala24

5,000+

优质数据集

54 个

任务类型

进入经典数据集