five

Data from: Assessing among-lineage variability in phylogenetic imputation of functional trait datasets

收藏
DataONE2018-01-23 更新2024-06-25 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
Phylogenetic imputation has recently emerged as a potentially powerful tool for predicting missing data in functional traits datasets. As such, understanding the limitations of phylogenetic modelling in predicting trait values is critical if we are to use them in subsequent analyses. Previous studies have focused on the relationship between phylogenetic signal and clade-level prediction accuracy, yet variability in prediction accuracy among individual tips of phylogenies remains largely unexplored. Here, we used simulations of trait evolution along the branches of phylogenetic trees to show how the accuracy of phylogenetic imputations is influenced by the combined effects of (1) the amount of phylogenetic signal in the traits and (2) the branch length of the tips to be imputed. Specifically, we conducted cross-validation trials to estimate the variability in prediction accuracy among individual tips on the phylogenies (hereafter “tip-level accuracy”). We found that under a Brownian motion model of evolution (BM, Pagel's λ = 1), tip-level accuracy rapidly decreased with increasing tip branch-lengths, and only tips of approximately 10% or less of the total height of the trees showed consistently accurate predictions (i.e. cross-validation R-squared > 0.75). When phylogenetic signal was weak, the effect of tip branch-length was reduced, becoming negligible for traits simulated with λ < 0.7, where accuracy was in any case low. Our study shows that variability in prediction accuracy among individual tips of the phylogeny should be considered when evaluating the reliability of phylogenetically imputed trait values. To address this challenge, we describe a Monte Carlo-based method that allows one to estimate the expected tip-level accuracy of phylogenetic predictions for continuous traits. Our approach identifies gaps in functional trait datasets for which phylogenetic imputation performs poorly, and will help ecologists to design more efficient trait collection campaigns by focusing resources on lineages whose trait values are more uncertain.

系统发育填充(phylogenetic imputation)近年来已成为预测功能性状数据集缺失数据的极具应用潜力的工具。若要将其应用于后续分析,厘清系统发育建模在性状值预测中的局限性至关重要。已有研究多聚焦于系统发育信号(phylogenetic signal)与支系水平预测精度间的关联,然而系统发育树单个末梢(tip)间的预测精度差异仍未得到充分探索。本研究通过模拟性状沿系统发育树分支的演化过程,阐明了系统发育填充的精度如何受两大因素的联合影响:(1) 性状所携带的系统发育信号强度;(2) 待填充末梢的分支长度。具体而言,我们开展了交叉验证试验,以估算系统发育树各单个末梢间的预测精度差异(下文简称“末梢水平精度”)。研究发现,在布朗运动进化模型(BM, Pagel's λ = 1)下,末梢水平精度随末梢分支长度增加而快速下降,仅占树木总高度约10%及以下的末梢可呈现稳定准确的预测结果(即交叉验证决定系数>0.75)。当系统发育信号较弱时,末梢分支长度的影响会被削弱,对于帕格尔λ<0.7的模拟性状而言,该影响可忽略不计,此时预测精度本就偏低。本研究表明,在评估系统发育填充得到的性状值的可靠性时,需考虑系统发育树单个末梢间的预测精度差异。为应对这一挑战,我们提出了一种基于蒙特卡洛(Monte Carlo-based method)的方法,可用于估算连续性状(continuous traits)的系统发育预测的预期末梢水平精度。该方法能够识别功能性状数据集中系统发育填充效果欠佳的缺口,同时可帮助生态学家将资源集中投向性状值更不确定的类群,从而设计更高效的性状采集方案。
创建时间:
2018-01-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作