Supplementary tables for: Dependent variable selection in phylogenetic generalized least squares regression analysis under Pagel’s lambda model
收藏DataCite Commons2026-03-12 更新2026-04-25 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.dncjsxm4v
下载链接
链接失效反馈官方服务:
资源简介:
Phylogenetic generalized least squares (PGLS) regression is widely used to
detect evolutionary correlations. In contrast to the equal treatment of
analyzed traits in conventional correlation methods such as Pearson and
Spearman's rank tests, we must designate one trait as the independent
variable and the other as the dependent variable. However, in our PGLS
regression analyses (using Pagel’s λ model) of both empirical and
simulated datasets, switching independent and dependent variables yielded
many conflicting results. A serious problem with PGLS regression that has
not been noticed before is that selecting an inappropriate trait as the
dependent variable will often result in an error. To assess correlations
in simulated data, we established a gold standard by analyzing changes in
traits along phylogenetic branches. Next, we tested seven potential
criteria for dependent variable selection: log-likelihood, Akaike
information criterion, R2, p-value, Pagel’s λ, Blomberg et al.’s K, and
the estimated λ in Pagel’s λ model. We determined that the last three
criteria performed equally well in selecting the dependent variable and
were superior to the other four. For practicality, we suggest using the
trait with a higher λ or K value as the dependent variable in future PGLS
regressions. In analyzing the evolutionary relationship between two
traits, we should designate the trait with a stronger phylogenetic signal
as the dependent variable even if it could logically assume the cause in
the relationship.
系统发育广义最小二乘(Phylogenetic Generalized Least Squares, PGLS)回归被广泛应用于演化相关性的检测。与皮尔逊、斯皮尔曼秩相关检验等传统相关分析方法均等对待所有待分析性状的处理逻辑不同,PGLS回归需将一个性状指定为自变量,另一个作为因变量。然而,在我们采用帕格尔λ(Pagel’s λ)模型对实证数据集与模拟数据集开展的PGLS回归分析中,交换自变量与因变量后会得到大量矛盾的分析结果。此前尚未被关注的PGLS回归存在一项严重问题:选择不适宜的性状作为因变量往往会引发分析误差。
为评估模拟数据中的相关性,我们通过解析性状沿系统发育分支的演化变化构建了金标准。随后,我们测试了7种可用于因变量选择的潜在准则:对数似然、赤池信息准则(Akaike Information Criterion, AIC)、决定系数R²、p值、帕格尔λ、布卢姆姆K(Blomberg et al.’s K)统计量,以及帕格尔λ模型中的估计λ值。研究发现,后三种准则在因变量选择任务中表现相当,且整体优于其余四项准则。出于实用层面的考量,我们建议在后续PGLS回归分析中,将具有更高λ或K值的性状作为因变量。
在分析两个性状间的演化关系时,即便某一性状在逻辑上可被视为该关系中的因,我们也应将系统发育信号更强的性状指定为因变量。
提供机构:
Dryad
创建时间:
2023-06-15



