Sequential Learning of Regression Models by Penalized Estimation
收藏tandf.figshare.com2023-05-30 更新2025-03-22 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Sequential_learning_of_regression_models_by_penalized_estimation/19100092/2
下载链接
链接失效反馈官方服务:
资源简介:
When data arrive in a sequence of two or more datasets, modeling on the most recent dataset should take previous datasets into account. We specifically investigate a strategy for regression modeling when parameter estimates from previous data can be used as anchoring points, yet may not be available for all parameters, thus, covariance information cannot be reused. A procedure that updates through targeted penalized estimation, which shrinks the estimator toward a nonzero value, is presented. The parameter estimate from the previous data serves as this nonzero value when an update is sought from novel data. This naturally extends to a sequence of datasets with the same response, but potentially only partial overlap in covariates. The iteratively updated regression parameter estimator is shown to be asymptotically unbiased and consistent. The penalty parameter is chosen through constrained cross-validated log-likelihood optimization. The constraint bounds the amount of shrinkage of the updated estimator toward the current one from below. The bound aims to preserve the (updated) estimator’s goodness of fit on all-but-the-novel data. The proposed approach is compared to other regression modeling procedures. Finally, it is illustrated on an epidemiological study where the data arrive in batches with different covariate-availability and the model is refitted with the availability of a novel batch. Supplementary materials for this article are available online.
当数据以两个或更多数据集的序列形式出现时,在最新数据集上的建模应考虑之前的数据集。本文特别探讨了一种回归建模策略,当先前数据的参数估计可以作为锚定点使用,但可能并非所有参数都可用,因此协方差信息无法重用时,该策略尤为适用。本文提出了一种通过目标惩罚估计更新过程的方法,该方法将估计量缩减至非零值。当从新颖数据中寻求更新时,先前数据的参数估计即作为该非零值。此方法自然扩展至具有相同响应但潜在协变量部分重叠的数据集序列。迭代更新的回归参数估计量被证明在渐近上是无偏且一致的。惩罚参数通过约束交叉验证对数似然优化选择。约束限制了更新估计量向当前估计量缩减的量,旨在保留(更新后的)估计量对所有非新颖数据的拟合优度。所提出的方法与其他回归建模程序进行了比较。最后,本文通过一个流行病学研究进行了说明,在该研究中,数据以不同协变量可用性的批次形式出现,并且当有新批次可用时,模型将进行重新拟合。本文的补充材料可在网上获取。
提供机构:
Taylor & Francis



