Graphical and Computational Tools to Guide Parameter Choice for the Cluster Weighted Robust Model
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://figshare.com/articles/dataset/Graphical_and_computational_tools_to_guide_parameter_choice_for_the_cluster_weighted_robust_model/21657766
下载链接
链接失效反馈官方服务:
资源简介:
The Cluster Weighted Robust Model (CWRM) is a recently introduced methodology to robustly estimate mixtures of regressions with random covariates. The CWRM allows users to flexibly perform regression clustering, safeguarding it against data contamination and spurious solutions. Nonetheless, the resulting solution depends on the chosen number of components in the mixture, the percentage of impartial trimming, the degree of heteroscedasticity of the errors around the regression lines and of the clusters in the explanatory variables. Therefore, an appropriate model selection is crucially required. Such a complex modeling task may generate several “legitimate” solutions: each one derived from a distinct hyperparameters specification. The present article introduces a two step-monitoring procedure to help users effectively explore such a vast model space. The first phase uncovers the most appropriate percentages of trimming, whilst the second phase explores the whole set of solutions, conditioning on the outcome derived from the previous step. The final output singles out a set of “top” solutions, whose optimality, stability and validity is assessed. Novel graphical and computational tools—specifically tailored for the CWRM framework—will help the user make an educated choice among the optimal solutions. Three examples on real datasets showcase our proposal in action. Supplementary files for this article are available online.
聚类加权稳健模型(Cluster Weighted Robust Model,CWRM)是近年来提出的一种可对带有随机协变量的回归混合模型开展稳健估计的方法。该模型支持用户灵活开展回归聚类,同时可有效抵御数据污染与虚假解的干扰。不过,最终得到的求解结果取决于多个设定参数:混合模型的组分数量、无偏修剪比例、回归直线周边误差的异方差程度,以及解释变量中各聚类的异方差程度。因此,恰当的模型选择至关重要。这类复杂的建模任务可能会生成多个"合法"求解结果,每一个均对应一组独特的超参数设定。本文提出了一种两步监测流程,以帮助用户高效探索这一庞大的模型空间。第一阶段可确定最优的修剪比例,第二阶段则基于第一阶段得到的结果,对所有可能的求解结果进行探索。最终输出将筛选出一组"最优"求解结果,并对其最优性、稳定性与有效性进行评估。本文还提出了专为CWRM框架定制的新型可视化与计算工具,可帮助用户在最优解集中做出审慎抉择。本文通过三个真实数据集上的示例展示了所提方法的实际应用效果。本文的补充材料可在线获取。
创建时间:
2022-12-01



