five

Data from: Construction of a species-level tree-of-life for the insects and utility in taxonomic profiling

收藏
DataONE2016-10-27 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
While comprehensive phylogenies have proven an invaluable tool in ecology and evolution, their construction is made increasingly challenging both by the scale and structure of publically available sequences. The distinct partition between gene-rich (genomic) and species-rich (DNA barcode) data is a feature of data that has been largely overlooked, yet presents a key obstacle to scaling supermatrix analysis. I present a phyloinformatics framework for draft construction of a species-level phylogeny of insects (Class Insecta). Matrix-building requires separately optimized pipelines for nuclear transcriptomic, mitochondrial genomic, and species-rich markers, whereas tree-building requires hierarchical inference in order to capture species-breadth while retaining deep-level resolution. The phylogeny of insects contains 49358 species, 13865 genera, 760 families, 31 orders. Deep-level splits largely reflected previous findings for sections of the tree that are data rich or unambiguous, such as inter-ordinal Endopterygota and Dictyoptera, the recently evolved and relatively homogeneous Lepidoptera, Hymenoptera, Brachycera (Diptera) and Cucujiformia (Coleoptera). However, analysis of bias, matrix construction and gene-tree variation suggests confidence in some relationships (such as in Polyneoptera) is less than has been indicated by the matrix bootstrap method. To assess the utility of the insect tree as a tool in query profiling, several tree-based taxonomic assignment methods are compared. Using mined test datasets of known species membership, a tendency is observed for greater accuracy of species-level assignments where using a fixed, comprehensive tree-of-life in contrast to methods generating smaller de novo reference trees. Described herein is a solution to the discrepancy in the way data is fit into supermatrices. The resulting tree facilitates wider studies of insect diversification and application of advanced descriptions of diversity in community studies, amongst other presumed applications.

尽管全面系统发育树已被证明是生态学与进化研究中极为宝贵的工具,但公共可用序列的规模与结构特征,却使得其构建难度与日俱增。富含基因(基因组)数据与富含物种(DNA条形码)数据之间的显著分化,这一数据特征长期以来被极大忽视,却成为超矩阵分析规模化进程中的关键阻碍。本文提出了一套系统发育信息学框架,用于构建昆虫纲(Class Insecta)物种级系统发育树的草稿版本。矩阵构建需针对核转录组、线粒体基因组以及富含物种的标记基因分别优化分析流程;而系统发育树构建则需采用层级推断策略,在兼顾物种覆盖广度的同时保留深层分支的分辨率。本次构建的昆虫系统发育树涵盖49358个物种、13865个属、760个科以及31个目。其深层分支结构在数据充足或明确的类群中,大体符合此前的研究结论,例如内翅部(Endopterygota)与网翅类(Dictyoptera)的目间关系,以及近期演化、类群相对均一的鳞翅目(Lepidoptera)、膜翅目(Hymenoptera)、短角亚目(Brachycera,双翅目Diptera)和扁甲总科(Cucujiformia,鞘翅目Coleoptera)。不过,通过对偏倚、矩阵构建过程以及基因树变异的分析可知,部分类群关系(如多新翅类Polyneoptera)的置信度,低于矩阵自助法(bootstrap)所显示的水平。为评估该昆虫系统发育树作为查询分类工具的效用,本文对比了多种基于树结构的分类归属方法。通过使用经挖掘得到的已知物种组成的测试数据集,研究发现:相较于构建小型从头(de novo)参考树的方法,采用固定且全面的生命之树进行物种级分类归属时,准确率往往更高。本文所述方法解决了数据适配超矩阵时的不一致性问题。所构建的系统发育树不仅为昆虫多样化研究提供了支撑,还可用于群落研究中更高级的多样性描述分析,以及其他诸多潜在应用场景。
创建时间:
2016-10-27
二维码
社区交流群
二维码
科研交流群
商业服务