five

Draft de novo genome assemblies of a male and female Amphibolurus muricatus (jacky dragon)

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/5523787
下载链接
链接失效反馈
官方服务:
资源简介:
Four de novo nuclear genome assemblies of Amphibolurus muricatus Assembly 1.0: A 10x Genomics linked-read sequencing assembly • AmpMurF_1.0.fa.tar.gz (female A. muricatus) • AmpMurM_1.0.fa.tar.gz (male A. muricatus) Assembly 1.1: Further scaffolding of assembly 1.0 using RNA-seq data • AmpMurF_1.1.fa.tar.gz (female A. muricatus) • AmpMurM_1.1.fa.tar.gz (male A. muricatus) Assembly 2.0: Further scaffolding of assembly 1.0 using SLR-superscaffolder • AmpMurF_2.0.fa.tar.gz (female A. muricatus) • AmpMurM_2.0.fa.tar.gz (male A. muricatus) Assembly 3.0: An stLFR linked-read sequencing assembly • AmpMurF_3.0.fa.tar.gz (female A. muricatus) • AmpMurM_3.0.fa.tar.gz (male A. muricatus) Methods Assembly 1.0: A 10x Genomics linked-read sequencing assembly Male and female A. muricatus genome sequencing libraries were constructed on the Chromium system (10x Genomics, Pleasanton, CA, USA) by the Ramaciotti Centre for Genomics (Sydney, Australia). The Chromium instrument enables unique barcoding of long stretches of DNA on gel beads. The barcodes allow later reconstruction of long DNA fragments from a series of short DNA fragments with the same barcode (i.e., linked-reads). After barcoding, DNA was sheared into smaller fragments and sequenced on the NovaSeq 6000 platform (Illumina, CA, USA) to generate 151 bp paired-end (PE) reads. A total of 904.9 M raw 10x Genomics Chromium linked-reads were generated. Raw 10x data were assembled with Supernova v2.1.1 (Weisenfeld et al., 2017) and a FASTA file was generated using the ‘pseudohap style’ option in Supernova mkoutput. All female (~450 M) and male (~550 M) read pairs were utilised (female sequencing depth ca 50.3×; male, ca 47.8×). The resulting assemblies was further scaffolded with ARKS v1.0.3 (Coombe et al., 2018), reusing the 10x reads, and the companion LINKS program (v1.8.7) (Warren et al., 2015). ARKS employs a k-mer approach to map linked barcodes to the contigs in the initial Supernova assembly to generate a scaffold graph with estimated distances for LINKS input. These assemblies were denoted AmpMurF_1.0 (female) and AmpMurM_1.0 (male). We used GapCloser v1.12 (part of SOAPdenovo2) (Luo et al., 2012) to fill gaps in the assembly. GapCloser was run using the parameter -l 150) and clean 10x Genomics reads PE reads.   Assembly 1.1: Further scaffolding using RNA-seq data We attempted to improve the v1.0 genome assemblies’ contiguity using RNA-sequencing reads. RNA-seq reads (from brain, ovary, and testis; see below) were filtered (i.e., cleaned) to remove adapters and low-quality reads using Flexbar v3.4.0 and used to further re-scaffold the v1.0 assemblies (FASTA files before gapclosing) with P_RNA_scaffolder (Zhu et al., 2018). The default Flexbar settings discards all reads with any uncalled bases. A final round of scaffolding was performed on the resulting assemblies using L_RNA_scaffolder (Xue et al., 2013). These assemblies were denoted AmpMurF_1.1 (female) and AmpMurM_1.1 (male). As before, GapCloser and clean 10x Genomics reads were used to fill gaps.      Assembly 2.0: Further scaffolding using SLR-superscaffolder As an alternative approach, we attempted to improve the v1.0 genome assemblies’ contiguity using SLR-superscaffolder (Guo et al., 2021). Briefly, SLR-superscaffolder employs single tube long fragment read (stLFR) sequencing (Wang et al., 2019) reads (see section below) to generate hybrid genome assemblies. The software was run with default parameters except for PE_SEED_MIN=300 (minimum contig size to fill; default 1000). These assemblies were denoted AmpMurF_2.0 (female) and AmpMurM_2.0 (male). GapCloser and clean stLFR reads (with the barcode removed using https://github.com/BGI-Qingdao/stLFR_barcode_split) were used to fill gaps.      Assembly 3.0: An stLFR linked-read sequencing and supernova assembly We also generated independent assemblies for the individuals sequenced on the 10x Genomics Chromium system using single tube long fragment read (stLFR) sequencing (Wang et al., 2019). BGI (Brisbane, Australia) generated ~100×-coverage 100-bp paired-end reads (plus a 42-bp stLFR barcode on the right/_2 read). Low-quality reads, PCR duplicates, and adaptors were removed using SOAPnuke v1.5 (Chen et al. 2018). The stLFRdenovo pipeline (https://github.com/BGI-biotools/stLFRdenovo), which is based on Supernova and customized for stLFR data, was used to generate a de novo genome assembly. The stLFRdenovo tool ‘FillGaps’ was used to fill gaps. References Chen, Y., Chen, Y., Shi, C., Huang, Z., Zhang, Y., Li, S., Li, Y., Ye, J., Yu, C., Li, Z., et al. (2018). SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. Gigascience 7, 1-6. Coombe, L., Zhang, J., Vandervalk, B.P., Chu, J., Jackman, S.D., Birol, I., and Warren, R.L. (2018). ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers. BMC Bioinformatics 19, 234. Guo, L., Xu, M., Wang, W., Gu, S., Zhao, X., Chen, F., Wang, O., Xu, X., Seim, I., Fan, G., et al. (2021). SLR-superscaffolder: a de novo scaffolding tool for synthetic long reads using a top-to-bottom scheme. BMC Bioinformatics 22, 158. Luo, R., Liu, B., Xie, Y., Li, Z., Huang, W., Yuan, J., He, G., Chen, Y., Pan, Q., Liu, Y., et al. (2012). SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18. Wang, O., Chin, R., Cheng, X., Wu, M.K.Y., Mao, Q., Tang, J., Sun, Y., Anderson, E., Lam, H.K., Chen, D., et al. (2019). Efficient and unique cobarcoding of second-generation sequencing reads from long DNA molecules enabling cost-effective and accurate sequencing, haplotyping, and de novo assembly. Genome Res 29, 798-808. Warren, R.L., Yang, C., Vandervalk, B.P., Behsaz, B., Lagman, A., Jones, S.J., and Birol, I. (2015). LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads. Gigascience 4, 35. Weisenfeld, N.I., Kumar, V., Shah, P., Church, D.M., and Jaffe, D.B. (2017). Direct determination of diploid genome sequences. Genome Res 27, 757-767. Xue, W., Li, J.T., Zhu, Y.P., Hou, G.Y., Kong, X.F., Kuang, Y.Y., and Sun, X.W. (2013). L_RNA_scaffolder: scaffolding genomes with transcripts. BMC Genomics 14, 604. Zhu, B.H., Xiao, J., Xue, W., Xu, G.C., Sun, M.Y., and Li, J.T. (2018). P_RNA_scaffolder: a fast and accurate genome scaffolder using paired-end RNA-sequencing reads. BMC Genomics 19, 175.
创建时间:
2023-08-10
二维码
社区交流群
二维码
科研交流群
商业服务