five

Data from: Comparative analysis of principal components can be misleading

收藏
DataONE2015-04-07 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
Most existing methods for modeling trait evolution are univariate, while researchers are often interested in investigating evolutionary patterns and processes across multiple traits. Principal components analysis (PCA) is commonly used to reduce the dimensionality of multivariate data as univariate trait models can be fit to the individual principal components. The problem with using standard PCA on phylogenetically structured data has been previously pointed out yet it continues to be widely used in the literature. Here we demonstrate precisely how using standard PCA can mislead inferences: the first few principal components of traits evolved under constant-rate multivariate Brownian motion will appear to have evolved via an “early burst” process. A phylogenetic PCA (pPCA) has been proprosed to alleviate these issues. However, when the true model of trait evolution deviates from the model assumed in the calculation of the pPCA axes, we find that the use of pPCA suffers from similar artifacts as standard PCA. We show that datasets with high effective dimensionality are particularly likely to lead to erroneous inferences. Ultimately, all of the problems we report stem from the same underlying issue—by considering only the first few principal components as univariate traits, we are effectively examining a biased sample of a multivariate pattern. These results highlight the need for truly multivariate phylogenetic comparative methods. As these methods are still being developed, we discuss potential alternative strategies for using and interpreting models fit to univariate axes of multivariate data.

当前绝大多数性状演化建模方法均为单变量方法,而研究者往往希望同时探究多性状的演化模式与演化过程。主成分分析(Principal Components Analysis, PCA)是目前多元数据降维的常用手段,因为单变量性状模型可适配于各个主成分。此前已有研究指出,对系统发育结构化数据使用标准PCA存在缺陷,但该方法仍在相关研究中被广泛应用。本研究精准阐释了标准PCA如何误导统计推断:在恒定速率多元布朗运动模型下演化的性状,其前若干主成分会呈现出"早爆发(early burst)"演化过程的特征。系统发育主成分分析(phylogenetic PCA, pPCA)正是为缓解此类问题而被提出。然而,当真性状演化模型与pPCA轴计算所采用的假设模型不符时,pPCA同样会出现与标准PCA一致的统计假象。研究表明,有效维度较高的数据集尤其容易产生错误的统计推断。归根结底,本研究揭示的所有问题均源于同一核心缺陷:仅将前若干主成分视为单变量性状进行分析时,我们实质上是在对多元模式的有偏样本进行检验。上述结果凸显了开发真正意义上的多元系统发育比较方法的必要性。鉴于此类方法仍处于发展阶段,本文还就如何使用并解读适配于多元数据单变量轴的模型,探讨了潜在的替代策略。
创建时间:
2015-04-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作