Large Scale Prediction with Decision Trees
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://figshare.com/articles/dataset/Large_Scale_Prediction_with_Decision_Trees/24552254
下载链接
链接失效反馈官方服务:
资源简介:
This article shows that decision trees constructed with Classification and Regression Trees (CART) and C4.5 methodology are consistent for regression and classification tasks, even when the number of predictor variables grows sub-exponentially with the sample size, under natural 0-norm and 1-norm sparsity constraints. The theory applies to a wide range of models, including (ordinary or logistic) additive regression models with component functions that are continuous, of bounded variation, or, more generally, Borel measurable. Consistency holds for arbitrary joint distributions of the predictor variables, thereby accommodating continuous, discrete, and/or dependent data. Finally, we show that these qualitative properties of individual trees are inherited by Breiman’s random forests. A key step in the analysis is the establishment of an oracle inequality, which allows for a precise characterization of the goodness of fit and complexity tradeoff for a mis-specified model. Supplementary materials for this article are available online.
本文证明,基于分类与回归树(Classification and Regression Trees, CART)及C4.5方法构建的决策树,在自然0范数与1范数稀疏性约束下,即使预测变量的数量随样本量呈亚指数级增长,其在回归与分类任务中仍具备一致性。该理论适用于多种模型,包括分量函数连续、有界变差或更一般地为博雷尔可测(Borel measurable)的普通(或逻辑)加性回归模型。一致性适用于预测变量的任意联合分布,因此可兼容连续、离散及/或相依数据。最后,本文证明单棵决策树的上述定性性质可被布莱曼随机森林继承。本分析的关键步骤在于建立了神谕不等式(oracle inequality),该不等式可精确刻画误设模型的拟合优度与复杂度权衡关系。本文的补充材料可在线获取。
创建时间:
2023-11-13



