KV Cache Consolidation Logs — Predictive Forgetting for Optimal Generalisation (Figure 5)
收藏Figshare2026-03-17 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/KV_Cache_Consolidation_Logs_Predictive_Forgetting_for_Optimal_Generalisation_Figure_5_/31534807
下载链接
链接失效反馈官方服务:
资源简介:
Pre-computed Key-Value (KV) cache logs from the large language model consolidation experiments reported in Figure 5 of "Why the Brain Consolidates: Predictive Forgetting for Optimal Generalisation" (Fountas et al., arXiv:2603.04688).These logs were generated using the Bottlenecked Transformer architecture (Oomerjee et al., ICLR 2026) applied to a frozen Llama-3 backbone on GSM8K mathematical reasoning tasks. Each file (log_N) is a serialised PyTorch tensor file containing the KV cache states across refinement steps for a single GSM8K example.File format. Each log file contains a list of steps, each step being a list of layer dictionaries with keys:k_cache: [B, H, S, D] — Key cache tensork_update: [B, H, S, D] — Key update vectorv_cache: [B, H, S, D] — Value cache tensorv_update: [B, H, S, D] — Value update vectormask: [B, S] — Active token maskwhere B = batch size (1), H = number of attention heads, S = sequence length, D = head dimension.Usage. These files are the inputs to src/kv_motion_figure.py and src/plot_grand_average.py in the accompanying code repository (https://github.com/zfountas/predictive-forgetting). Running bash scripts/run_figure5.sh with these files placed in sample_data/ will reproduce Figure 5 panels (a)–(f). Bootstrap statistics derived from these logs (used for Figure 5e, the grand-average plot) should be placed in stats_cache/ as CSV files with columns layer, ratio, ci_width.Compute context. Generating these logs required approximately one week of GPU computation on a multi-GPU server with Llama-3-8B-Instruct. The logs are provided here to allow reproduction of Figure 5 without re-running the full LLM pipeline. Full details of the consolidation protocol are given in the Methods section of the paper (Large Language Model Experiments subsection).Coverage. This deposit contains N = 100 log files (log_9 through log_108, with indices corresponding to GSM8K example indices). These are the files used in the N = 1,318 aggregate analysis reported in Figure 5e; the full set of logs is available upon reasonable request to the corresponding author (zafeirios.fountas@huawei.com).
创建时间:
2026-03-17



