LUNCH: adaptive balancing of continual learning via hyperparameter uncertainty
收藏中国科学数据2026-03-09 更新2026-04-25 收录
下载链接:
https://www.sciengine.com/AA/doi/10.1007/s11432-024-4570-7
下载链接
链接失效反馈官方服务:
资源简介:
Continual learning (CL) is characterized by learning sequentially arriving tasks and behaving as if they were observed simultaneously. In order to prevent catastrophic forgetting of old tasks when learning new tasks, representative CL methods usually employ additional loss terms to balance their contributions (e.g., regularization and replay), modulated by deterministic hyperparameters. However, this strategy struggles to accommodate real-time changes in data distributions and also lacks robustness to subsequent unseen tasks, especially in online scenarios where CL is performed with a one-pass data stream. Inspired by adaptive weighting in multi-task learning, we propose an innovative approach named learning uncertain hyperparameters (LUNCH) for adaptive balancing of task contributions in CL. Specifically, we formulate each CL-relevant hyperparameter as a function of optimizable uncertainty under the homoscedastic assumption and ensure its training stability through the exponential moving average of network parameters. We further devise an evaluation protocol that moderately adjusts the hyperparameter values and reports their impact on performance, so as to analyze the sensitivity of these sub-optimal values in realistic applications. We perform extensive experiments to demonstrate the effectiveness and robustness of our approach, which significantly improves online CL in a plug-in manner (e.g., up to 11.26% and 5.64% on Split CIFAR-100 and Split Mini-ImageNet, respectively) as well as offline CL. Our code is included in S Materials for examination and will be released upon acceptance.
持续学习(Continual Learning,CL)的核心特征为按序学习陆续抵达的任务,并表现出仿佛同时观测到所有任务的学习模式。为避免在学习新任务时出现旧任务的灾难性遗忘问题,主流持续学习方法通常会引入额外损失项以平衡各任务的贡献度(如正则化与经验回放机制),且这类损失项由确定性超参数进行调制。然而该策略难以适配数据分布的实时变化,且对后续未见过的任务鲁棒性不足,尤其在以单遍数据流执行持续学习的在线场景中这一问题更为突出。受多任务学习中自适应加权思路的启发,我们提出了一种名为学习不确定超参数(Learning Uncertain Hyperparameters,LUNCH)的创新方法,用于在持续学习场景中自适应平衡各任务的贡献度。具体而言,我们在同方差假设下,将每个与持续学习相关的超参数建模为可优化不确定性的函数,并通过网络参数的指数移动平均来保障训练稳定性。我们进一步设计了一种评估协议,该协议可适度调整超参数取值并报告其对模型性能的影响,以此分析次优超参数取值在实际应用中的敏感性。我们通过大量实验验证了所提方法的有效性与鲁棒性,该方法能够以即插即用的方式显著提升在线持续学习性能——在拆分CIFAR-100(Split CIFAR-100)与拆分Mini-ImageNet(Split Mini-ImageNet)数据集上的性能提升分别可达11.26%与5.64%,同时也能有效优化离线持续学习的表现。本文代码已收录于供评审使用的S材料中,待论文录用后将公开发布。
创建时间:
2025-09-02



