Data from: Assessing among-lineage variability in phylogenetic imputation of functional trait datasets
收藏DataONE2018-01-23 更新2024-06-25 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
Phylogenetic imputation has recently emerged as a potentially powerful tool for predicting missing data in functional traits datasets. As such, understanding the limitations of phylogenetic modelling in predicting trait values is critical if we are to use them in subsequent analyses. Previous studies have focused on the relationship between phylogenetic signal and clade-level prediction accuracy, yet variability in prediction accuracy among individual tips of phylogenies remains largely unexplored. Here, we used simulations of trait evolution along the branches of phylogenetic trees to show how the accuracy of phylogenetic imputations is influenced by the combined effects of (1) the amount of phylogenetic signal in the traits and (2) the branch length of the tips to be imputed. Specifically, we conducted cross-validation trials to estimate the variability in prediction accuracy among individual tips on the phylogenies (hereafter “tip-level accuracy”). We found that under a Brownian motion model of evolution (BM, Pagel's λ = 1), tip-level accuracy rapidly decreased with increasing tip branch-lengths, and only tips of approximately 10% or less of the total height of the trees showed consistently accurate predictions (i.e. cross-validation R-squared > 0.75). When phylogenetic signal was weak, the effect of tip branch-length was reduced, becoming negligible for traits simulated with λ < 0.7, where accuracy was in any case low. Our study shows that variability in prediction accuracy among individual tips of the phylogeny should be considered when evaluating the reliability of phylogenetically imputed trait values. To address this challenge, we describe a Monte Carlo-based method that allows one to estimate the expected tip-level accuracy of phylogenetic predictions for continuous traits. Our approach identifies gaps in functional trait datasets for which phylogenetic imputation performs poorly, and will help ecologists to design more efficient trait collection campaigns by focusing resources on lineages whose trait values are more uncertain.
系统发育补全(Phylogenetic Imputation)近年来已成为预测功能性状(functional traits)数据集缺失数据的极具潜力的工具。正因如此,若要将其应用于后续分析,明晰系统发育建模在性状值预测中的局限性至关重要。既往研究多聚焦于系统发育信号(phylogenetic signal)与支系水平预测精度间的关联,但针对系统发育树单个终端分类单元间的预测精度差异,相关探索仍较为匮乏。
本研究通过模拟性状在系统发育树分支上的演化过程,揭示了系统发育补全精度受两类因素共同影响的规律:(1)性状所含系统发育信号的强度;(2)待补全终端分类单元的分支长度。具体而言,我们通过交叉验证(cross-validation)试验,估算了系统发育树各单个终端分类单元间的预测精度差异(以下简称"终端水平精度")。
研究发现,在布朗运动演化模型(Brownian motion model,缩写BM,Pagel's λ=1)下,终端水平精度随终端分类单元分支长度的增加而快速下降,仅总树高约10%及以下的终端分类单元可获得稳定准确的预测结果(即交叉验证决定系数(R-squared)>0.75)。当系统发育信号较弱时,终端分支长度的影响会减弱;当λ<0.7时,该影响可忽略不计,此时预测精度整体偏低。
本研究表明,在评估经系统发育补全的性状值的可靠性时,需充分考虑系统发育树各终端分类单元间的预测精度差异。为应对这一挑战,我们提出了一种基于蒙特卡洛(Monte Carlo)的方法,可用于估算连续性状(continuous traits)的系统发育预测的预期终端水平精度。该方法可识别出系统发育补全效果不佳的功能性状数据集缺口,并通过将研究资源集中于性状值不确定性更高的演化支系,帮助生态学家设计更高效的性状采集方案。
创建时间:
2018-01-23



