Reproducible Hyperparameter Optimization
收藏Taylor & Francis Group2021-09-29 更新2026-04-16 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Reproducible_Hyperparameter_Optimization/14916681/1
下载链接
链接失效反馈官方服务:
资源简介:
A key issue in current machine learning research is the lack of reproducibility. We illustrate what role hyperparameter search plays in this problem and how regular hyperparameter search methods can lead to a large variance in outcomes due to non-deterministic model training during hyperparameter optimization. The variation in outcomes poses a problem both for reproducibility of the hyperparameter search itself and comparisons of different methods each optimized using hyperparameter search. In addition, the fact that hyperparameter search may result in non-optimal hyperparameter settings may affect other studies since hyperparameter settings are often copied from previously published research. To remedy this issue we define the mean prediction error across model training runs as the objective for the hyperparameter search. We then propose a hypothesis testing procedure that makes inference on the mean performance of each hyperparameter setting and results in an equivalence class of hyperparameter settings that are not distinguishable in performance. We further embed this procedure into a group sequential testing framework to increase efficiency in terms of the average number of model training replicates required. Empirical results on machine learning benchmarks show that at equal computation the proposed method reduces the variation in hyperparameter search outcomes by up to 90 percent while resulting in equal or lower mean prediction errors when compared to standard random search and Bayesian optimization. Moreover, the sequential testing framework successfully reduces computation while preserving performance of the method. Code to reproduce the results is available in the supplementary materials and online<sup>1</sup>.
当前机器学习研究中的一项关键问题是可复现性缺失。我们阐释了超参数搜索(hyperparameter search)在该问题中所扮演的角色,以及常规超参数搜索方法为何会因超参数优化(hyperparameter optimization)过程中的非确定性模型训练,导致实验结果存在较大方差。这种结果方差既会影响超参数搜索自身的可复现性,也会给各经超参数搜索优化的不同方法之间的性能对比带来难题。此外,超参数搜索可能得到次优超参数配置这一情况,也会对其他研究造成不利影响——因为超参数配置常常直接沿用已发表研究中的设定。为解决这一问题,我们将多次模型训练运行的平均预测误差定义为超参数搜索的优化目标。随后我们提出了一种假设检验流程(hypothesis testing procedure),用于推断每种超参数配置的平均性能,并得到一组在性能上无法区分的超参数配置等价类。我们进一步将该流程嵌入分组序贯检验框架(group sequential testing framework),以降低所需的平均模型训练重复次数,提升搜索效率。机器学习基准(machine learning benchmarks)数据集上的实验结果表明,在计算资源相当的前提下,与标准随机搜索和贝叶斯优化(Bayesian optimization)相比,所提方法可将超参数搜索结果的方差降低至多90%,同时获得相等或更低的平均预测误差。此外,该分组序贯检验框架可在保证方法性能的前提下有效减少计算量。复现实验结果的代码可在补充材料及在线平台<sup>1</sup>获取。
提供机构:
Gillen, Daniel L.; Hertel, Lars; Baldi, Pierre
创建时间:
2021-09-29



