Evaluating the stability of decision tree algorithms

Mendeley Data2024-01-31 更新2024-06-27 收录

下载链接：

http://digitallibrary.usc.edu/cdm/ref/collection/p15799coll40/id/386854

下载链接

链接失效反馈

官方服务：

资源简介：

Decision Trees (DT), one method falling under the umbrella of Exploratory Data Mining (EDM; McArdle & Ritschard, 2013), has seen an increasing amount of use in psychological research. This can be attributed to its easily interpretable tree structure, along with its propensity to automatically capture interactions among predictors. However, the creation of tree structures comes at the expense of generalizability—namely that slight perturbations in the sample can result in dramatically different results. The purpose of this paper was to investigate this instability and to provide metrics to quantify the propensity for the results to generalize. This was examined across five studies using multiple DT algorithms and datasets to determine whether there were differences in stability, and what factors accounted for these differences. There were large differences in both stability and predictive performance across algorithms, with some favoring simplistic, stable trees, and others large, unstable, but highly predictive models. The addition of simulated missingness and the testing of both imputation and surrogate splits resulted in mixed results. Single imputation did not add enough variability to the imputed values, whereas surrogate splits worked well. Finally, this comparison was applied to an empirical example—predicting dementia with various cognitive scales and demographic factors. In this, none of the DT algorithms produced highly stable results, and had lower accuracy than both multinomial regression and random forests. In sum, these results demonstrate that measuring the stability of DT algorithms has utility in comparing multiple algorithms and methods, providing additional information to researchers as to both the viability and usefulness of using DT algorithms in their research.

创建时间：

2024-01-31

5,000+

优质数据集

54 个

任务类型

进入经典数据集