Data from: R2s for correlated data: phylogenetic models, LMMs, and GLMMs
收藏DataONE2018-09-12 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
Many researchers want to report an R2 to measure the variance explained by a model. When the model includes correlation among data, such as phylogenetic models and mixed models, defining an R2 faces two conceptual problems. (i) It is unclear how to measure the variance explained by predictor (independent) variables when the model contains covariances. (ii) Researchers may want the R2 to include the variance explained by the covariances by asking questions such as “How much of the data is explained by phylogeny?” Here, I investigate three R2s for phylogenetic and mixed models. R2resid is an extension of the ordinary least-squares R2 that weights residuals by variances and covariances estimated by the model; it is closely related to R2glmm presented by Nakagawa and Schielzeth (2013). R2pred is based on predicting each residual from the fitted model and computing the variance between observed and predicted values. R2lik is based on the likelihood of fitted models and therefore reflects the amount of information that the models contain. These three R2s are formulated as partial R2s, making it possible to compare the contributions of predictor variables and variance components (phylogenetic signal and random effects) to the fit of models. Because partial R2s compare a full model with a reduced model without components of the full model, they are distinct from marginal R2s that partition additive components of the variance. The properties of the R2s for phylogenetic models were assessed using simulations for continuous and binary response data (phylogenetic generalized least squares and phylogenetic logistic regression). Because the R2s are designed broadly for any model for correlated data, the R2s were also compared for LMMs and GLMMs. R2resid, R2pred, and R2lik all have similar performance in describing the variance explained by different components of models. However, R2pred gives the most direct answer to the question of how much variance in the data is explained by a model. R2resid is most appropriate for comparing models fit to different datasets, because it does not depend on sample sizes. And R2lik is most appropriate to assess the importance of different components within the same model applied to the same data, because it is most closely associated with statistical significance tests.
诸多研究者希望报告决定系数(R2)以衡量模型所解释的方差。当模型包含数据间的相关性时,例如系统发育模型(phylogenetic models)与混合效应模型(mixed models),定义决定系数(R2)会面临两个概念性难题:其一,当模型包含协方差(covariance)项时,如何衡量预测变量(自变量,predictor variables)所解释的方差尚不明确;其二,研究者可能希望决定系数(R2)能够纳入协方差项所解释的方差,例如通过提出“系统发育(phylogeny)可解释多少数据变异?”这类问题。
本文针对系统发育模型与混合效应模型,探究了三类决定系数(R2)。残差加权决定系数(R2resid)是普通最小二乘法(ordinary least-squares)决定系数的扩展,其通过模型估计的方差与协方差对残差(residuals)进行加权;该指标与Nakagawa和Schielzeth于2013年提出的广义线性混合模型决定系数(R2glmm)密切相关。预测型决定系数(R2pred)基于拟合模型对每个残差进行预测,并计算观测值与预测值之间的方差。似然型决定系数(R2lik)基于拟合模型的似然(likelihood)值,因此能够反映模型所蕴含的信息量。
上述三类决定系数均以偏决定系数(partial R2s)的形式构建,因此可以比较预测变量与方差组分(variance components,系统发育信号(phylogenetic signal)与随机效应(random effects))对模型拟合的贡献程度。由于偏决定系数是将完整模型与不含完整模型组分的简化模型进行比较,因此其与用于拆分方差加性组分的边际决定系数(marginal R2s)存在本质区别。
本文针对连续型与二分类响应数据(包括系统发育广义最小二乘法(phylogenetic generalized least squares)与系统发育逻辑回归(phylogenetic logistic regression)),通过模拟实验评估了系统发育模型所用决定系数的性能。由于此类决定系数的设计初衷是广泛适用于所有带相关性的数据模型,因此本文还针对线性混合模型(LMMs)与广义线性混合模型(GLMMs)对三类指标进行了对比。
R2resid、R2pred与R2lik在描述模型不同组分所解释的方差方面,均表现出相近的性能。但R2pred对于“模型可解释数据中多少方差”这一问题,能够给出最为直接的解答。R2resid不受样本量影响,因此最适合用于对比拟合自不同数据集的模型。而R2lik与统计显著性检验的关联最为紧密,因此最适合用于评估同一模型应用于同一数据集时,不同组分的重要性。
创建时间:
2018-09-12



