Chromosome-scale genome assembly of the African spiny mouse (Acomys cahirinus)

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://zenodo.org/record/7761276

下载链接

链接失效反馈

官方服务：

资源简介：

Genomic DNA was extracted from blood from a single male A. cahirinus animal using a Monarch HMW DNA Extraction Kit for Cells & Blood (T3050, New England Biolabs, Ipswich MA) following the manufacturer’s recommended protocol. DNA was quantified prior to library construction using the Qubit DNA HS Assay (ThermoFischer, Waltham MA) and DNA fragment lengths were assessed using the Agilent Femto Pulse System (Santa Clara, CA). Libraries were prepared for sequencing using the Oxford Nanopore ligation kit (SQK-LSK110) following the manufacturers’ instructions, except that DNA repair and A-tailing was performed for 30 min and the ligation was allowed to continue for 1 hr. Prepared libraries were quantified using a Qubit fluorometer and 30 fmol of the library was loaded onto a Nanopore version R.9.4.1 flow cell and loaded on a PromethION running MinKNOW version (21.05.20). To increase output, the flow cell was washed after approximately 24 hr of sequencing then an additional 12 fmol of library was added to the flow cell and run for an additional 48 hr. Basecalling was performed using Guppy 5.0.12 (Oxford Nanopore) using the superior model (dna_r9.4.1_450bps_sup_prom.cfg). FASTQ files for assembly were extracted from unaligned bam files using samtools (Li et al. 2009) then Flye version 2.9 for assembly using the --nano-hq flag (Kolmogorov et al. 2019). Haplotigs and overlaps in the assembly were purged using purge_dups (https://github.com/dfguan/purge_dups). The assembly was then polished using Medaka version 1.4.2 (https://github.com/nanoporetech/medaka) followed by a second polishing step with pilon version 1.24 (Walker et al. 2014). Assembly statistics at each step were generated using Quast (Gurevich et al. 2013) and BUSCO (Simão et al. 2015) (Table S2). The primary contigs assembled from the Nanopore data were anchored to chromosomes using 505,210,505 read pairs of a Hi-C library isolated from another A. cahirinus individual of unknown sex downloaded from the NCBI Short Read Archive (SRX13258644) (Wang et al. 2022). After aligning the Hi-C reads with the ArimaHi-C Mapping Pipeline (https://github.com/ArimaGenomics/mapping_pipeline), YaHS v1.0 (Zhou et al. 2023) was used with default error correction for scaffolding, and Juicebox v1.11.08 (Dudchenko et al. 2018) was used to generate a Hi-C contact map. Progressive Cactus was used (Armstrong et al. 2020) to perform a whole-genome alignment of the A. cahirinus draft assembly to the Mus musculus GRCm39 reference genome (RefSeq GCF_000001635.27_GRCm39). Comparative annotation of the draft genomes was then performed using the Comparative Annotation Toolkit (CAT) (Fiddes et al. 2018). Briefly, the M. musculus RefSeq annotation GFF was parsed and validated with the “parse_ncbi_gff3” and “validate_gff3” programs (respectively) from CAT. The M. musculus reference transcript cDNA sequences were downloaded and mapped to the M. musculus draft genome with minimap2 (Li 2018) and provided to CAT as long-read RNA-seq reads in the “[ISO_SEQ_BAM]” field of the configuration file. For A. cahirinus, bulk RNA-seq data obtained from multiple pooled organs were downloaded from NCBI SRA BioProject PRJNA342864 (Bellofiore et al. 2017) and mapped to the draft assembly with STAR (Dobin et al. 2013) then provided to CAT in the “[BAMS]” field. CpG islands were identified using the cpg_lh utility from the UCSC suite of tools (Kent et al. 2002).

创建时间：

2023-04-01