Data from: Using Illumina Next Generation Sequencing Technologies to sequence multigene families in de novo species

DataONE2013-02-08 更新2024-06-27 收录

下载链接：

https://search.dataone.org/view/null

下载链接

链接失效反馈

官方服务：

资源简介：

The advent of Next Generation Sequencing Technology (NGST) has revolutionized molecular biology research, allowing for rapid gene/genome sequencing from a multitude of diverse species. As high throughput sequencing becomes more accessible, more efficient workflows must be developed to deal with the amounts of data produced and better assemble the genomes of de novo lineages. We combine traditional laboratory methods with Illumina NGST to amplify and sequence the largest mammalian multigene family, the Olfactory Receptor gene family, for species with and without a reference genome. We develop novel assembly methods to annotate and filter these data, which can be utilized for any gene family or any species. We find no significant difference between the ratio of genes within their respective gene families of our data compared with available genomic data. Using simulated data we explore the limitations of short-read sequence data and our assembly in recovering this gene family. We highlight the benefits and shortcomings of these methods. Compared with data generated from traditional polymerase chain reaction, cloning and Sanger sequencing methodologies, sequence data generated using our pipeline increases yield and sequencing efficiency without reducing the number of unique genes amplified. A cloning step is not required, therefore shortening data generation time. The novel downstream methodologies and workflows described provide a tool to be utilized by many fields of biology, to access and analyze the vast quantities of data generated. By combining laboratory and in silico methods, we provide a means of extracting genomic information for multigene families without complete genome sequencing.

下一代测序技术（Next Generation Sequencing Technology，NGST）的问世彻底革新了分子生物学研究格局，使我们能够快速对众多不同物种的基因或基因组开展测序。随着高通量测序（high throughput sequencing）技术的普及程度不断提升，亟需开发更高效的分析流程以应对其产生的海量数据，并更精准地完成无参考基因组物种谱系的基因组从头组装。我们将传统实验室方法与Illumina测序技术相结合，针对拥有参考基因组和无参考基因组的物种，对哺乳动物最大的多基因家族——嗅觉受体基因家族（Olfactory Receptor gene family）进行扩增与测序。我们开发了全新的组装方法用于注释和筛选此类数据，该方法可推广应用于任意基因家族或任意物种。研究发现，本研究数据中各基因家族内的基因占比与现有基因组数据相比无显著差异。我们借助模拟数据，探究了短读长测序数据以及本研究的组装方法在获取该基因家族过程中存在的局限性，并阐明了这些方法的优势与不足。与传统聚合酶链式反应（polymerase chain reaction，PCR）、克隆及桑格测序（Sanger sequencing）方法生成的数据相比，采用本研究分析流程得到的测序数据在不减少扩增的独特基因数量的前提下，提升了测序产出量与测序效率。该流程无需克隆步骤，因此缩短了数据生成周期。本文所述的全新下游分析方法与流程，为生物学诸多领域提供了一套可用于获取并分析海量测序数据的工具。通过结合实验室实验与计算模拟方法，我们提供了一种无需完成全基因组测序即可获取多基因家族基因组信息的可行途径。

创建时间：

2013-02-08