Supplementary tables for: Dependent variable selection in phylogenetic generalized least squares regression analysis under Pagel’s lambda model

Name: Supplementary tables for: Dependent variable selection in phylogenetic generalized least squares regression analysis under Pagel’s lambda model
Creator: Dryad
Published: 2026-03-12 18:00:17
License: 暂无描述

DataCite Commons2026-03-12 更新2026-04-25 收录

下载链接：

https://datadryad.org/dataset/doi:10.5061/dryad.dncjsxm4v

下载链接

链接失效反馈

官方服务：

资源简介：

Phylogenetic generalized least squares (PGLS) regression is widely used to detect evolutionary correlations. In contrast to the equal treatment of analyzed traits in conventional correlation methods such as Pearson and Spearman's rank tests, we must designate one trait as the independent variable and the other as the dependent variable. However, in our PGLS regression analyses (using Pagel’s λ model) of both empirical and simulated datasets, switching independent and dependent variables yielded many conflicting results. A serious problem with PGLS regression that has not been noticed before is that selecting an inappropriate trait as the dependent variable will often result in an error. To assess correlations in simulated data, we established a gold standard by analyzing changes in traits along phylogenetic branches. Next, we tested seven potential criteria for dependent variable selection: log-likelihood, Akaike information criterion, R2, p-value, Pagel’s λ, Blomberg et al.’s K, and the estimated λ in Pagel’s λ model. We determined that the last three criteria performed equally well in selecting the dependent variable and were superior to the other four. For practicality, we suggest using the trait with a higher λ or K value as the dependent variable in future PGLS regressions. In analyzing the evolutionary relationship between two traits, we should designate the trait with a stronger phylogenetic signal as the dependent variable even if it could logically assume the cause in the relationship.

系统发育广义最小二乘（Phylogenetic Generalized Least Squares, PGLS）回归被广泛应用于演化相关性的检测。与皮尔逊、斯皮尔曼秩相关检验等传统相关分析方法均等对待所有待分析性状的处理逻辑不同，PGLS回归需将一个性状指定为自变量，另一个作为因变量。然而，在我们采用帕格尔λ（Pagel’s λ）模型对实证数据集与模拟数据集开展的PGLS回归分析中，交换自变量与因变量后会得到大量矛盾的分析结果。此前尚未被关注的PGLS回归存在一项严重问题：选择不适宜的性状作为因变量往往会引发分析误差。为评估模拟数据中的相关性，我们通过解析性状沿系统发育分支的演化变化构建了金标准。随后，我们测试了7种可用于因变量选择的潜在准则：对数似然、赤池信息准则（Akaike Information Criterion, AIC）、决定系数R²、p值、帕格尔λ、布卢姆姆K（Blomberg et al.’s K）统计量，以及帕格尔λ模型中的估计λ值。研究发现，后三种准则在因变量选择任务中表现相当，且整体优于其余四项准则。出于实用层面的考量，我们建议在后续PGLS回归分析中，将具有更高λ或K值的性状作为因变量。在分析两个性状间的演化关系时，即便某一性状在逻辑上可被视为该关系中的因，我们也应将系统发育信号更强的性状指定为因变量。

提供机构：

Dryad

创建时间：

2023-06-15

5,000+

优质数据集

54 个

任务类型

进入经典数据集