five

Data from: Utility of pooled sequencing for association mapping in non-model organisms

收藏
DataONE2018-03-16 更新2024-06-25 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
High density genome-wide sequencing increases the likelihood of discovering genes of major effect and genomic structural variation in organisms. While there is an increasing availability of reference genomes across broad taxa, the greatest limitation to whole-genome sequencing of multiple individuals continues to be the costs associated with sequencing. To alleviate excessive costs, pooling multiple individuals with similar phenotypes and sequencing the homogenized DNA (Pool-Seq) can achieve high genome coverage, but at the loss of individual genotypes. Although Pool-Seq has been an effective method for association mapping in model organisms, it has not been frequently utilized in natural populations. To extend bioinformatic tools for rapid implementation of Pool-Seq data in non-model organisms, we developed a pipeline called PoolParty and illustrate its effectiveness in genetic association mapping. Alignment expectations based on five pooled Chinook salmon (Oncorhynchus tshawytscha) libraries showed that approximately 48% genome coverage per library could be achieved with reasonable sequencing effort. We additionally examined male and female O. tshawytscha libraries to illustrate how Pool-Seq techniques can successfully map known genes associated with functional differences among sexes such as growth hormone 2. Finally, we compared pools of individuals of different spawning ages for each sex to discover novel genes involved with age at maturity in O. tshawytscha such as opsin4 and transmembrane protein19. While not appropriate for every system, Pool-Seq data processed by the PoolParty pipeline is a practical method for identifying genes of major effect in non-model organisms when high genome coverage is necessary and cost is a limiting factor.

高密度全基因组测序可提升在生物体中发现主效基因(genes of major effect)与基因组结构变异(genomic structural variation)的概率。尽管跨多个生物类群的参考基因组(reference genome)可及性日益提升,但对多个个体开展全基因组测序的最大瓶颈仍在于测序相关的成本开销。为缓解过高的测序成本,将表型(phenotype)相似的多个个体混合并对均质化DNA进行测序(Pool-Seq),可实现高基因组覆盖度,但会丢失个体基因型信息。尽管Pool-Seq已成为模式生物关联作图(association mapping)的有效手段,但在自然种群中的应用却并不普遍。为拓展可快速处理非模式生物(non-model organisms)Pool-Seq数据的生物信息学(bioinformatics)工具,本研究开发了一款名为PoolParty的分析流程,并验证了其在遗传关联作图中的有效性。基于5个奇努克鲑鱼(Oncorhynchus tshawytscha)混合文库的比对预期结果显示,在合理的测序工作量下,每个文库可实现约48%的基因组覆盖度。本研究还对大鳞大麻哈鱼的雌雄文库进行了分析,以展示Pool-Seq技术如何成功定位与雌雄功能差异相关的已知基因,例如生长激素2(growth hormone 2)。最后,本研究针对每个性别组对比了不同产卵年龄的个体混合池,以发掘与大鳞大麻哈鱼成熟年龄相关的新基因,例如视蛋白4(opsin4)与跨膜蛋白19(transmembrane protein 19)。尽管并非适用于所有研究体系,但当需要高基因组覆盖度且成本受限的情况下,经PoolParty流程处理的Pool-Seq数据是在非模式生物中鉴定主效基因的实用手段。
创建时间:
2018-03-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作