five

Data from: Construction of a species-level tree-of-life for the insects and utility in taxonomic profiling

收藏
DataONE2016-10-27 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
Although comprehensive phylogenies have proven an invaluable tool in ecology and evolution, their construction is made increasingly challenging both by the scale and structure of publically available sequences. The distinct partition between gene-rich (genomic) and species-rich (DNA barcode) data is a feature of data that has been largely overlooked, yet presents a key obstacle to scaling supermatrix analysis. I present a phyloinformatics framework for draft construction of a species-level phylogeny of insects (Class Insecta). Matrix-building requires separately optimized pipelines for nuclear transcriptomic, mitochondrial genomic, and species-rich markers, whereas tree-building requires hierarchical inference in order to capture species-breadth while retaining deep-level resolution. The phylogeny of insects contains 49,358 species, 13,865 genera, 760 families. Deep-level splits largely reflected previous findings for sections of the tree that are data rich or unambiguous, such as inter-ordinal Endopterygota and Dictyoptera, the recently evolved and relatively homogeneous Lepidoptera, Hymenoptera, Brachycera (Diptera), and Cucujiformia (Coleoptera). However, analysis of bias, matrix construction and gene-tree variation suggests confidence in some relationships (such as in Polyneoptera) is less than has been indicated by the matrix bootstrap method. To assess the utility of the insect tree as a tool in query profiling several tree-based taxonomic assignment methods are compared. Using test data sets with existing taxonomic annotations, a tendency is observed for greater accuracy of species-level assignments where using a fixed comprehensive tree of life in contrast to methods generating smaller de novo reference trees. Described herein is a solution to the discrepancy in the way data are fit into supermatrices. The resulting tree facilitates wider studies of insect diversification and application of advanced descriptions of diversity in community studies, among other presumed applications.

尽管全面的系统发育树已被证实为生态学与进化研究领域不可或缺的核心工具,但其构建工作却因公开可用序列的规模与结构特征而愈发充满挑战。富基因(genomic,基因组)数据与富物种(DNA barcode,DNA条形码)数据之间的显著分化长期被学界忽视,却成为限制超矩阵分析(supermatrix analysis)扩展性的关键障碍。本文提出了一套系统发育信息学(phyloinformatics)框架,用于构建昆虫纲(Class Insecta)物种级系统发育树的草稿版本。构建该分析矩阵需要针对核转录组(nuclear transcriptomic)、线粒体基因组(mitochondrial genomic)以及富物种标记分别优化专属分析流程;而系统发育树的构建则需采用层级推断策略,以在保障物种覆盖广度的同时保留深层系统发育分辨率。本次构建的昆虫系统发育树涵盖49358个物种、13865个属以及760个科。其深层分支节点在数据充足或结果明确的类群中基本与既往研究结论一致,例如内翅部(Endopterygota)与网翅总目(Dictyoptera)的目间演化关系,以及近期演化且类群相对均一的鳞翅目(Lepidoptera)、膜翅目(Hymenoptera)、双翅目短角亚目(Brachycera)和鞘翅目扁甲总科(Cucujiformia)。不过,针对偏倚因素、矩阵构建过程以及基因树变异的分析显示,部分类群(如多新翅类Polyneoptera)的系统发育关系支持度低于超矩阵自助法所给出的评估结果。为评估该昆虫系统发育树在查询分析(query profiling)中的应用价值,本文对比了多种基于树结构的分类学分配方法(taxonomic assignment methods)。通过携带已知分类注释的测试数据集,研究发现:相较于生成小型从头参考树(de novo reference trees)的方法,使用固定的全面生命树进行物种级分类分配时,准确率显著更高。本文所述方案解决了数据适配超矩阵分析框架时存在的不一致性问题。最终得到的系统发育树可为昆虫类群多样化研究以及群落生态学研究中多样性的精细化描述等诸多潜在应用场景提供有力支撑。
创建时间:
2016-10-27
二维码
社区交流群
二维码
科研交流群
商业服务