Data from: Potential and pitfalls of eukaryotic metagenome skimming: A test case for lichens

DataONE2015-09-11 更新2024-06-27 收录

下载链接：

https://search.dataone.org/view/null

下载链接

链接失效反馈

官方服务：

资源简介：

Whole genome shotgun sequencing of multi species communities using only a single library layout is commonly used to assess taxonomic and functional diversity of microbial assemblages. Here we investigate to what extent such metagenome skimming approaches are applicable for in-depth genomic characterizations of eukaryotic communities, e.g. lichens. We address how to best assemble a particular eukaryotic metagenome skimming data, what pitfalls can occur, and what genome quality can be expected from this data. To facilitate a project specific benchmarking, we introduce the concept of twin sets, simulated data resembling the outcome of a particular metagenome sequencing study. We show that the quality of genome reconstructions depends essentially on assembler choice. Individual tools, including the metagenome assemblers Omega and MetaVelvet, are surprisingly sensitive to low and uneven coverages. In combination with the routine of assembly parameter choice to optimize the assembly N50 size, these tools can preclude an entire genome from the assembly. In contrast, MIRA, an all-purpose overlap assembler, and SPAdes, a multi-sized de Bruijn graph assembler, facilitate a comprehensive view on the individual genomes across a wide range of coverage ratios. Testing assemblers on a real-world metagenome skimming data from the lichen Lasallia pustulata demonstrates the applicability of twin sets for guiding method selection. Furthermore, it reveals that the assembly outcome for the photobiont Trebouxia sp. falls behind the a-priori expectation given the simulations. Although the underlying reasons remain still unclear this highlights that further studies on this organism require special attention during sequence data generation and downstream analysis.

仅采用单文库布局开展多物种群落全基因组鸟枪测序（Whole genome shotgun sequencing），是当前评估微生物群落分类与功能多样性的常规手段。本研究旨在探究此类宏基因组浅测序（metagenome skimming）方法在多大程度上可应用于真核生物群落（如地衣）的深度基因组解析。我们着重探讨了如何最优组装特定的真核生物宏基因组浅测序数据、可能出现的技术陷阱，以及该数据可达成的基因组组装质量。为支持面向特定项目的基准测试，我们提出了“孪生数据集（twin sets）”的概念，即模拟与特定宏基因组测序研究结果高度匹配的测序数据。研究表明，基因组重建的质量本质上取决于组装工具的选择。包括宏基因组组装工具Omega与MetaVelvet在内的多款工具，对低覆盖度与不均一覆盖度表现出意外的敏感性。若结合以优化组装N50值为目标的常规组装参数选择策略，此类工具可能会导致完整基因组无法被组装出来。相比之下，通用重叠组装工具MIRA以及多尺度de Bruijn图组装工具SPAdes，可在广泛的覆盖度比例范围内实现对单个基因组的全面解析。基于地衣Lasallia pustulata的真实宏基因组浅测序数据开展组装工具测试，验证了孪生数据集可用于指导方法选择。此外，该测试还发现，地衣光合共生体Trebouxia sp.的组装结果未达到模拟实验的先验预期。尽管其背后的具体机制尚未明确，但这一结果凸显出针对该物种的后续研究在序列数据生成与下游分析环节需格外谨慎。

创建时间：

2015-09-11