Data from: A simple approach for maximizing the overlap of phylogenetic and comparative data
收藏DataONE2015-12-02 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
Biologists are increasingly using curated, public data sets to conduct phylogenetic comparative analyses. Unfortunately, there is often a mismatch between species for which there is phylogenetic data and those for which other data are available. As a result, researchers are commonly forced to either drop species from analyses entirely or else impute the missing data.
A simple strategy to improve the overlap of phylogenetic and comparative data is to swap species in the tree that lack data with ‘phylogenetically equivalent’ species that have data. While this procedure is logically straightforward, it quickly becomes very challenging to do by hand. Here, we present algorithms that use topological and taxonomic information to maximize the number of swaps without altering the structure of the phylogeny.
We have implemented our method in a new R package phyndr, which will allow researchers to apply our algorithm to empirical data sets. It is relatively efficient such that taxon swaps can be quickly computed, even for large trees. To facilitate the use of taxonomic knowledge, we created a separate data package taxonlookup; it contains a curated, versioned taxonomic lookup for land plants and is interoperable with phyndr.
Emerging online data bases and statistical advances are making it possible for researchers to investigate evolutionary questions at unprecedented scales. However, in this effort species mismatch among data sources will increasingly be a problem; evolutionary informatics tools, such as phyndr and taxonlookup, can help alleviate this issue.
生物学家正愈发广泛地借助经过整理的公共数据集开展系统发育比较分析(phylogenetic comparative analyses)。遗憾的是,搭载系统发育数据的物种与拥有其他类型数据的物种之间,时常存在匹配错位的问题。这使得研究人员通常只能二选一:要么彻底将部分物种从分析中剔除,要么对缺失数据进行插补。
一种可提升系统发育数据与比较数据重叠覆盖度的简易策略,是将系统发育树中缺失对应数据的物种,替换为具备所需数据的“系统发育等价(phylogenetically equivalent)”物种。尽管该操作逻辑上直观易懂,但手动执行很快便会变得极具挑战性。本文提出了一类基于拓扑学与分类学信息的算法,可在不改变系统发育树结构的前提下,最大化可进行交换的物种数量。
我们已将该方法封装为一款全新的R包phyndr,可供研究人员将算法应用于实际数据集。该工具具备较高的运算效率,即便针对大型系统发育树,物种交换的计算也可快速完成。为便于研究者使用分类学知识,我们还开发了一款独立的数据包taxonlookup:该数据包包含针对陆生植物的、经过整理且带版本标识的分类学查找表,且可与phyndr实现互操作。
新兴的在线数据库与统计学研究进展,使得研究人员得以以前所未有的规模开展进化生物学问题探究。但在此过程中,不同数据源间的物种匹配错位问题将愈发凸显;而诸如phyndr与taxonlookup这类进化信息学工具,便可助力缓解这一难题。
创建时间:
2015-12-02



