Data from: Practical low-coverage genomewide sequencing of hundreds of individually barcoded samples for population and evolutionary genomics in nonmodel species

DataONE2016-08-03 更新2024-06-26 收录

下载链接：

https://search.dataone.org/view/null

下载链接

链接失效反馈

官方服务：

资源简介：

Today most population genomic studies of nonmodel organisms either sequence a subset of the genome deeply in each individual or sequence pools of unlabelled individuals. With a step-by-step workflow, we illustrate how low-coverage whole-genome sequencing of hundreds of individually barcoded samples is now a practical alternative strategy for obtaining genomewide data on a population scale. We used a highly efficient protocol to generate high-quality libraries for ~6.5 USD from each of 876 Atlantic silversides (a teleost fish with a genome size ~730 Mb) that we sequenced to 1–4× genome coverage. In the absence of a reference genome, we developed a bioinformatic pipeline for mapping the genomic reads to a de novo assembled reference transcriptome. This provides an ‘in silico’ method for exome capture that avoids the complexities and expenses of using wet chemistry for target isolation. Using novel tools for analysis of low-coverage data, we extracted population allele frequencies, individual genotype likelihoods and polymorphism data for 2 504 335 SNPs across the exome for the 876 fish. To illustrate the use of the resulting data, we present a preliminary analysis of geographical patterns in the exome data and a comparison of complete mitochondrial genome sequences for each individual (constructed from the low-coverage data) that show population colonization patterns along the US east coast. With a total cost per sample of less than 50 USD (including sequencing) and ability to prepare 96 libraries in only 5 h, our approach adds a viable new option to the population genomics toolbox.

当前，针对非模式生物（nonmodel organisms）的群体基因组学研究大多采用两种策略：其一为对每个个体的基因组子集开展深度测序，其二为对混合未标记个体进行测序。本文通过一套分步式工作流程，阐释了针对数百个带有个体条形码标记的样本开展低覆盖度全基因组测序，如今已成为获取群体规模全基因组数据的一种切实可行的替代策略。我们依托一套高效实验方案，为876尾大西洋银汉鱼（一种基因组大小约730 Mb的硬骨鱼）构建了高质量测序文库，单样本文库构建成本约为6.5美元，最终测序覆盖度达1~4倍基因组覆盖度。由于缺乏参考基因组，我们开发了一套生物信息学流程，可将基因组测序读段（reads）比对至从头组装的参考转录组。该方法可实现基于虚拟（in silico）的外显子组捕获，规避了利用湿实验化学技术进行靶标富集所带来的操作复杂性与成本开销。借助针对低覆盖度数据开发的新型分析工具，我们为这876尾大西洋银汉鱼的外显子组区域提取了2504335个单核苷酸多态性位点（SNPs）的群体等位基因频率、个体基因型似然值与多态性数据。为展示所得数据的应用场景，我们对外显子组数据中的地理分布模式开展了初步分析，并对每个个体的完整线粒体基因组序列（由低覆盖度测序数据拼接得到）进行了比较，结果揭示了美国东海岸沿线的群体定居模式。单样本总测序成本不足50美元（含测序费用），且可在5小时内完成96个文库的构建，本研究提出的方法为群体基因组学工具库增添了一项切实可行的新选择。

创建时间：

2016-08-03