Oncorhynchus mykiss isolate:Arlee Genome sequencing and assembly
收藏agdatacommons.nal.usda.gov2024-11-23 更新2025-03-22 收录
下载链接:
https://agdatacommons.nal.usda.gov/articles/dataset/Oncorhynchus_mykiss_isolate_Arlee_Genome_sequencing_and_assembly/25086479/1
下载链接
链接失效反馈官方服务:
资源简介:
Although the most recent version of the rainbow trout genome assembly from the Swanson line has greatly improved the genome reference and is reliable for genes' predictions, it contains 420,055 spanned gaps and 7,839 un-spanned gaps (GCA_002163495.1). Hence, there is still a need to improve the contiguity and completeness of the reference assembly, which is now possible with long-read DNA sequencing technologies. Currently, we are also working towards generating a rainbow trout pan-genome reference that will better represent the genetic diversity in this species. The Arlee doubled haploid YY male line has a different genetic background from the Swanson line. It was originated from a domesticated strain that was originally collected from the northern California coast. For the Arlee genome assembly, we generated 111x genome coverage in long-read sequence data using the PacBio Sequel system. The read length distribution has N50 of about 33 kb and an average read length greater than 20 kb. Contigs were assembled using the Canu pipeline and consensus sequence was error-corrected using two iterations of Arrow with the PacBio reads followed by one iteration of Freebayes using Illumina paired-end reads. The Canu assembly contained 1,591 contigs with an N50 contig length of 9.8 Mbp, which is a major improvement in contiguity compared to the current Swanson assembly. The assembly was further improved with a Bionano optical map and Hi-C proximity ligation sequence data to produce super-scaffolds. The total length of the final assembly is ~2,33 Gbp, of which ~95% was anchored into 29 chromosome sequences using the same rainbow trout high-density genetic map that we have previously used for the Swanson reference genome assembly. The new assembly is composed primarily of 32 major scaffolds corresponding perfectly to the karyotype of the Arlee line (2N=64). Six of the Arlee acrocentric chromosomes can be perfectly aligned with three of the Swanson line metacentric chromosomes. The three Swanson chromosomes that are being divided to two acrocentric chromosomes are Omy04, 14 and 25 as we have previously described in Pearse et al. (2019).
尽管来自Swanson系的彩虹鲑最新版本的基因组组装在基因组参考方面取得了显著进步,并可靠地预测了基因,但仍含有420,055个跨越间隙和7,839个未跨越间隙(GCA_002163495.1)。因此,仍有必要提高参考组装的连续性和完整性,而长读长DNA测序技术使得这一目标如今成为可能。目前,我们正致力于生成彩虹鲑的全基因组参考,以更好地代表该物种的遗传多样性。Arlee双倍体YY雄性系与Swanson系具有不同的遗传背景,起源于最初从加利福尼亚北部海岸收集的家养菌株。对于Arlee基因组组装,我们利用PacBio Sequel系统在长读长测序数据中实现了111倍的全基因组覆盖。读长分布的N50约为33 kb,平均读长超过20 kb。使用Canu流程组装了连续片段,并使用PacBio读数通过Arrow进行两轮错误校正,随后使用Illumina双端读数通过Freebayes进行一轮校正。Canu组装包含1,591个连续片段,N50连续片段长度为9.8 Mbp,与当前的Swanson组装相比,连续性得到了显著提升。通过Bionano光学图谱和Hi-C邻近交联测序数据进一步优化了组装,产生了超连续片段。最终组装的总长度约为2,33 Gbp,其中约95%使用我们之前用于Swanson参考基因组组装的同一种彩虹鲑高密度遗传图谱锚定到29个染色体序列中。新的组装主要由32个主要连续片段组成,与Arlee系的核型(2N=64)完美对应。Arlee系的六个端着丝粒染色体可以与Swanson系的三个中着丝粒染色体完美对齐。正在被分割为两个端着丝粒染色体的三个Swanson染色体是Omy04、14和25,正如我们在Pearse等人(2019年)中所述。
提供机构:
National Center for Biotechnology Information



