five

Minimization and estimation of the variance of prediction errors for cross-validation designs

收藏
DataCite Commons2020-09-04 更新2024-07-25 收录
下载链接:
https://figshare.com/articles/Minimization_and_estimation_of_the_variance_of_prediction_errors_for_cross_validation_designs/3124324/1
下载链接
链接失效反馈
官方服务:
资源简介:
We consider the mean prediction error of a classification or regression procedure as well as its cross-validation estimates, and investigate the variance of this estimate as a function of an arbitrary cross-validation design. We decompose this variance into a scalar product of coefficients and certain covariance expressions, such that the coefficients depend solely on the resampling design, and the covariances depend solely on the data’s probability distribution. We rewrite this scalar product in such a form that the initially large number of summands can gradually be decreased down to three under the validity of a quadratic approximation to the core covariances. We show an analytical example in which this quadratic approximation holds true exactly. Moreover, in this example, we show that the leave-<i>p</i>-out estimator of the error depends on <i>p</i> only by means of a constant and can, therefore, be written in a much simpler form. Furthermore, there is an unbiased estimator of the variance of <i>K</i>-fold cross-validation, in contrast to a claim in the literature. As a consequence, we can show that balanced incomplete block designs have smaller variance than <i>K</i>-fold cross-validation. In a real data example from the UCI machine learning repository, this property can be confirmed. We finally show how to find balanced incomplete block designs in practice.
提供机构:
Taylor & Francis
创建时间:
2016-03-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作