five

Data from: Construction of a species-level tree-of-life for the insects and utility in taxonomic profiling

收藏
DataONE2016-10-27 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
While comprehensive phylogenies have proven an invaluable tool in ecology and evolution, their construction is made increasingly challenging both by the scale and structure of publically available sequences. The distinct partition between gene-rich (genomic) and species-rich (DNA barcode) data is a feature of data that has been largely overlooked, yet presents a key obstacle to scaling supermatrix analysis. I present a phyloinformatics framework for draft construction of a species-level phylogeny of insects (Class Insecta). Matrix-building requires separately optimized pipelines for nuclear transcriptomic, mitochondrial genomic, and species-rich markers, whereas tree-building requires hierarchical inference in order to capture species-breadth while retaining deep-level resolution. The phylogeny of insects contains 49358 species, 13865 genera, 760 families, 31 orders. Deep-level splits largely reflected previous findings for sections of the tree that are data rich or unambiguous, such as inter-ordinal Endopterygota and Dictyoptera, the recently evolved and relatively homogeneous Lepidoptera, Hymenoptera, Brachycera (Diptera) and Cucujiformia (Coleoptera). However, analysis of bias, matrix construction and gene-tree variation suggests confidence in some relationships (such as in Polyneoptera) is less than has been indicated by the matrix bootstrap method. To assess the utility of the insect tree as a tool in query profiling, several tree-based taxonomic assignment methods are compared. Using mined test datasets of known species membership, a tendency is observed for greater accuracy of species-level assignments where using a fixed, comprehensive tree-of-life in contrast to methods generating smaller de novo reference trees. Described herein is a solution to the discrepancy in the way data is fit into supermatrices. The resulting tree facilitates wider studies of insect diversification and application of advanced descriptions of diversity in community studies, amongst other presumed applications.

尽管综合系统发育树已被证明是生态学与进化研究中极为宝贵的工具,但公共可用序列的规模与结构特征,却使得其构建难度与日俱增。富含基因(基因组)与富含物种(DNA条形码)两类数据之间的显著划分,这一数据特征长期被忽视,却成为限制超矩阵分析(supermatrix analysis)规模化应用的关键障碍。本文提出一套系统发育信息学(phyloinformatics)框架,用于构建昆虫纲(Class Insecta)物种级系统发育树的草稿版本。矩阵构建需针对核转录组、线粒体基因组以及富含物种的标记序列分别优化分析流程;而系统发育树构建则需采用层级推断策略,以在兼顾物种覆盖广度的同时保留深层节点的分辨率。本研究构建的昆虫系统发育树涵盖49358个物种、13865个属、760个科以及31个目。系统发育树的深层分支格局在数据充足或解析明确的类群中,基本与此前研究结果一致,例如目间关系的内翅总目(Endopterygota)与网翅总目(Dictyoptera),以及近期演化、类群相对均一的鳞翅目(Lepidoptera)、膜翅目(Hymenoptera)、短角亚目(双翅目,Brachycera)和扁甲总群(鞘翅目,Cucujiformia)。不过,针对偏倚、矩阵构建以及基因树变异的分析显示,部分类群(如多新翅总目Polyneoptera)的系统发育关系置信度,低于矩阵自举法(bootstrap)所提示的水平。为评估该昆虫系统发育树作为查询分类工具的实用性,本文对多种基于树结构的分类学指派方法进行了对比。利用已挖掘的已知物种组成的测试数据集开展验证,结果观察到:相较于生成小型从头(de novo)构建参考树的方法,使用固定且综合的生命之树(tree-of-life)可实现更高精度的物种水平分类指派。本文所述框架解决了数据适配超矩阵(supermatrix)时存在的不一致性问题。最终构建的系统发育树可为昆虫多样化研究的拓展、群落研究中多样性的高级描述应用,以及其他诸多潜在应用场景提供支撑。
创建时间:
2016-10-27
二维码
社区交流群
二维码
科研交流群
商业服务