Data from: Construction of a species-level tree of life for the insects and utility in taxonomic profiling
收藏DataONE2016-10-27 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
Although comprehensive phylogenies have proven an invaluable tool in ecology and evolution, their construction is made increasingly challenging both by the scale and structure of publically available sequences. The distinct partition between gene-rich (genomic) and species-rich (DNA barcode) data is a feature of data that has been largely overlooked, yet presents a key obstacle to scaling supermatrix analysis. I present a phyloinformatics framework for draft construction of a species-level phylogeny of insects (Class Insecta). Matrix-building requires separately optimized pipelines for nuclear transcriptomic, mitochondrial genomic, and species-rich markers, whereas tree-building requires hierarchical inference in order to capture species-breadth while retaining deep-level resolution. The phylogeny of insects contains 49,358 species, 13,865 genera, 760 families. Deep-level splits largely reflected previous findings for sections of the tree that are data rich or unambiguous, such as inter-ordinal Endopterygota and Dictyoptera, the recently evolved and relatively homogeneous Lepidoptera, Hymenoptera, Brachycera (Diptera), and Cucujiformia (Coleoptera). However, analysis of bias, matrix construction and gene-tree variation suggests confidence in some relationships (such as in Polyneoptera) is less than has been indicated by the matrix bootstrap method. To assess the utility of the insect tree as a tool in query profiling several tree-based taxonomic assignment methods are compared. Using test data sets with existing taxonomic annotations, a tendency is observed for greater accuracy of species-level assignments where using a fixed comprehensive tree of life in contrast to methods generating smaller de novo reference trees. Described herein is a solution to the discrepancy in the way data are fit into supermatrices. The resulting tree facilitates wider studies of insect diversification and application of advanced descriptions of diversity in community studies, among other presumed applications.
尽管综合系统发育树(comprehensive phylogenies)已被证实是生态学与进化研究中不可或缺的核心工具,但公共可用序列的规模与结构特征,却使其构建难度与日俱增。富含基因的基因组(genomic)数据与富含物种的DNA条形码(DNA barcode)数据之间的显著分化,这一数据特征在很大程度上被忽视,却成为制约超级矩阵分析(supermatrix analysis)规模化应用的关键障碍。本文提出了一套系统信息学(phyloinformatics)框架,用于构建昆虫纲(Class Insecta)的物种级系统发育树草图。矩阵构建需针对核转录组、线粒体基因组以及富含物种标记分别优化分析流程;而系统发育树构建则需采用层级推断策略,在保障物种覆盖广度的同时保留深层分支的分辨率。该昆虫系统发育树涵盖49358个物种、13865个属以及760个科。其深层节点分化在数据充足或结论明确的类群中基本与既往研究结果一致,例如内翅总目(Endopterygota)与网翅总目(Dictyoptera)的目间关系,以及新近演化、类群相对均一的鳞翅目(Lepidoptera)、膜翅目(Hymenoptera)、双翅目(Diptera)短角亚目(Brachycera)和鞘翅目(Coleoptera)扁甲总科(Cucujiformia)。不过,针对偏差、矩阵构建以及基因树变异的分析显示,部分类群(如多新翅总目Polyneoptera)的系统发育关系可信度,并未如矩阵自助法(matrix bootstrap method)所呈现的那样高。为评估该昆虫系统发育树作为查询分析工具的实用性,本文对比了多种基于树的分类分配方法。借助带有现有分类注释的测试数据集,研究发现:相较于生成小型从头构建参考树的方法,使用固定的综合生命树能够获得更高的物种级分类分配准确率。本文所提出的框架解决了数据适配超级矩阵时存在的不一致性问题。最终得到的系统发育树,将助力更广泛的昆虫演化辐射研究,以及在群落研究中应用更精细化的多样性描述框架等诸多潜在应用场景。
创建时间:
2016-10-27



