Butanium/max-activating-examples-gemma-2-2b-l13-mu4.1e-02-lr1e-04
收藏Hugging Face2024-11-25 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/Butanium/max-activating-examples-gemma-2-2b-l13-mu4.1e-02-lr1e-04
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了在gemma 2 2B模型的第13层上训练得到的跨编码器的所有特征的最大激活示例。这些示例来源于fineweb的验证测试子集和lmsys的聊天数据。数据集由三个文件组成:base_examples.pt、chat_examples.pt和chat_base_examples.pt,分别存储不同数据源的最大激活示例及其合并结果。每个文件都是一个字典,键为特征索引,值为一个列表,列表中的每个元素是一个元组,包含样本的最大激活值、样本的令牌列表和每个令牌的激活值。
This dataset contains the maximum activating examples for all the features of our Crosscoder model trained on GEMMA 2 2B layer 13. The dataset includes three files: `base_examples.pt`, `chat_examples.pt`, and `chat_base_examples.pt`. These files contain the maximum examples on a subset of validation test, lmsys chat data, and a merge of the two, respectively. All files are of type `dict[int, list[tuple[float, list[str], list[float]]]]`, where each feature index corresponds to a list, and each element in the list is a tuple containing the maximum activation value of the sample, the list of tokens of the sample, and the activation values for each token. The samples are sorted from the highest activating example to the least activating example.
提供机构:
Butanium



