Data from: Potential and pitfalls of eukaryotic metagenome skimming: A test case for lichens

DataONE2015-09-11 更新2024-06-27 收录

下载链接：

https://search.dataone.org/view/null

下载链接

链接失效反馈

官方服务：

资源简介：

Whole genome shotgun sequencing of multi species communities using only a single library layout is commonly used to assess taxonomic and functional diversity of microbial assemblages. Here we investigate to what extent such metagenome skimming approaches are applicable for in-depth genomic characterizations of eukaryotic communities, e.g. lichens. We address how to best assemble a particular eukaryotic metagenome skimming data, what pitfalls can occur, and what genome quality can be expected from this data. To facilitate a project specific benchmarking, we introduce the concept of twin sets, simulated data resembling the outcome of a particular metagenome sequencing study. We show that the quality of genome reconstructions depends essentially on assembler choice. Individual tools, including the metagenome assemblers Omega and MetaVelvet, are surprisingly sensitive to low and uneven coverages. In combination with the routine of assembly parameter choice to optimize the assembly N50 size, these tools can preclude an entire genome from the assembly. In contrast, MIRA, an all-purpose overlap assembler, and SPAdes, a multi-sized de Bruijn graph assembler, facilitate a comprehensive view on the individual genomes across a wide range of coverage ratios. Testing assemblers on a real-world metagenome skimming data from the lichen Lasallia pustulata demonstrates the applicability of twin sets for guiding method selection. Furthermore, it reveals that the assembly outcome for the photobiont Trebouxia sp. falls behind the a-priori expectation given the simulations. Although the underlying reasons remain still unclear this highlights that further studies on this organism require special attention during sequence data generation and downstream analysis.

仅采用单文库建库方案的多物种群落全基因组鸟枪测序（Whole genome shotgun sequencing），目前常被用于评估微生物群落的分类学与功能多样性。本研究旨在探究此类宏基因组浅测序（metagenome skimming）策略在真核生物群落（如地衣）的深度基因组解析中的适用范围。本研究同时探讨了如何最优组装特定的真核生物宏基因组浅测序数据、该类数据可能存在的技术陷阱，以及由此类数据可获得的基因组质量水平。为便于开展项目特异性基准测试，我们提出了“孪生数据集”（twin sets）的概念，即模拟与特定宏基因组测序研究结果一致的测序数据。研究表明，基因组重建的质量本质上取决于基因组组装工具的选择。包括宏基因组组装工具Omega与MetaVelvet在内的多款工具，对低覆盖度与不均一覆盖度表现出意外的敏感性。若再结合为优化组装N50值而选择组装参数的常规操作，此类工具可能会导致完整基因组无法被组装出来。与之相反，通用重叠组装工具MIRA以及多尺度德布鲁因图（de Bruijn graph）组装工具SPAdes，可在广泛的覆盖度比例范围内实现对各单个基因组的全面解析。通过对来自地衣Lasallia pustulata的真实宏基因组浅测序数据开展组装工具测试，证实了“孪生数据集”可用于指导方法选择。此外，该测试还显示，光合共生体（photobiont）共球藻属（Trebouxia sp.）的组装结果未达到模拟分析给出的先验预期。尽管其背后的原因目前仍不明确，但这一结果凸显出，针对该物种的后续研究在序列数据生成与下游分析环节需予以特别关注。

创建时间：

2015-09-11