Data from: The unexpected depths of genome-skimming data: a case study examining Goodeniaceae floral symmetry genes
收藏DataONE2017-10-20 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
Premise of the study: The use of genome skimming allows systematists to quickly generate large data sets, particularly of sequences in high abundance (e.g., plastomes); however, researchers may be overlooking data in low abundance that could be used for phylogenetic or evo-devo studies. Here, we present a bioinformatics approach that explores the low-abundance portion of genome-skimming next-generation sequencing libraries in the fan-flowered Goodeniaceae.
Methods: Twenty-four previously constructed Goodeniaceae genome-skimming Illumina libraries were examined for their utility in mining low-copy nuclear genes involved in floral symmetry, specifically the CYCLOIDEA (CYC)-like genes. De novo assemblies were generated using multiple assemblers, and BLAST searches were performed for CYC1, CYC2, and CYC3 genes.
Results: Overall Trinity, SOAPdenovo-Trans, and SOAPdenovo implementing lower k-mer values uncovered the most data, although no assembler consistently outperformed the others. Using SOAPdenovo-Trans across all 24 data sets, we recovered four CYC-like gene groups (CYC1, CYC2, CYC3A, and CYC3B) from a majority of the species. Alignments of the fragments included the entire coding sequence as well as upstream and downstream regions.
Discussion: Genome-skimming data sets can provide a significant source of low-copy nuclear gene sequence data that may be used for multiple downstream applications.
研究背景:利用基因组浅层测序(genome skimming)技术,分类学家可快速构建大规模数据集,尤其适用于高丰度序列(如质体基因组(plastomes))的获取;但研究者往往忽略了低丰度序列数据,这类数据可用于系统发育(phylogenetic)或进化发育生物学(evo-devo)相关研究。本研究提出一种生物信息学方法,用于探究扇花科(Goodeniaceae)物种基因组浅层测序文库中的低丰度序列部分。
方法:本研究选取24份已构建完成的扇花科基因组浅层测序Illumina文库,评估其在挖掘花对称性(floral symmetry)相关低拷贝核基因方面的应用潜力,目标基因为CYCLOIDEA(CYC)类基因。使用多款组装软件开展从头组装(de novo assemblies),并针对CYC1、CYC2及CYC3基因开展BLAST搜索。
结果:总体而言,采用较低k-mer值的Trinity、SOAPdenovo-Trans及SOAPdenovo三款软件可获取最多的数据,但无任何一款组装软件能始终优于其余软件。针对全部24份数据集使用SOAPdenovo-Trans进行组装时,我们从大多数物种中成功获得4个CYC类基因类群,分别为CYC1、CYC2、CYC3A及CYC3B。所获片段的比对序列涵盖完整编码区及其上下游侧翼区域。
讨论:基因组浅层测序数据集可作为低拷贝核基因序列数据的重要来源,可用于多种下游研究应用。
创建时间:
2017-10-20



