Four new genome sequences of the Pallas's cat (Otocolobus manul): an insight into the patterns of within-species variability

NIAID Data Ecosystem2026-05-02 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.pzgmsbcvt

下载链接

链接失效反馈

官方服务：

资源简介：

Manul (Otocolobus manul) is the only representative of the genus Otocolobus belonging to the Leopard Cat lineage. Their habitat is characterized by harsh environmental conditions. Although their populations are probably more stable than previously thought, it is still the case that their population size is declining. The main cause of their decline is the destruction of their natural habitat, which together with their natural behavior results in geographically fragmented populations and a potential threat of loss of genetic variability. Conservation programs exist to protect manuls, but those based on captive breeding are often unsuccessful due to their increased susceptibility to diseases. The manul is therefore a suitable model species for evolutionary and diversity studies as well as for studying mechanisms of adaptation to harsh environment and mechanisms of susceptibility to diseases. Whole genome sequencing (WGS) is an important tool for such studies, providing base-by-base view of the genome. Recently, the genome of the Otocolobus manul based on nanopore long-range sequencing has been published. Using whole genome resequencing via the Illumina platform, we obtained information on the genomes of four other manuls aiming to better understand inter-and intraspecific variation of the species. The parameters characterizing the quality of sequencing were within the standard range, all four genomes analyzed were similar in most characteristics. On average, we detected a total of 3,668,327 polymorphic variants. Information on different types of structural varinats not available from the reference genome was retrieved. The average whole-genome heterozygosity detected was almost identical to that found in the Otocolobus manul reference genome. In this context, we performed a more detailed analysis of the candidate gene EPAS1 potentially related to adaptation to the hypoxic environment. This analysis revealed both inter-and intraspecific variation, confirmed the presence of a previously described non-synonymous substitution in exon 15 unique to manuls and identified three additional unique non-synonymous substitutions located in so far not analyzed EPAS1 exonic sequences. Methods Tissues of four different manul individuals (one female and three males) obtained from Czech ZOOs (ZOO Jihlava, ZOO Brno, ZOO Prague) were selected based on available information about their origin and relatedness. The samples were obtained either post-mortem or as part of veterinary procedures performed for other reasons. DNA was extracted from available tissue samples (either blood, spleen or colon) using Qiagen (Germany) MagAttract HMW DNA isolation kit. The kit was used according to the manufacturer’s recommendations. Two isolations were made for each individual. DNA samples were evaluated in terms of purity (absorbance) and concentration using Tecan (Switzerland) Infinite 200 Pro plate reader. DNA samples were stored at 4°C for 5 days and then transported on dry ice to the Novogene sequencing facility in Germany. Samples were checked prior to library construction using Agilent 5400 fragment analyzer. All samples passed QC (quantity ≥ 200 ng; OD260/280=1.8-2.0, no degradation). Sequencing was performed using the Illumina NovaSeq X Plus platform as a service provided by Novogene (China). A total amount of 0.2 μg DNA per sample was used as input material for the DNA library preparations. The genomic DNA sample was randomly fragmented by sonication to a size of 350 bp. Then DNA fragments were endpolished, A-tailed, and ligated with the full-length adapter for Illumina sequencing. The fragments with adapters were size selected, PCR amplified, and purified by AMPure XP system (Beverly, USA). Subsequently, library quality was assessed on the Agilent 5400 system (Agilent, USA) and quantified by QPCR (1.5 nM). The qualified libraries were pooled and sequenced on Illumina NovaSeq X Plus platform with PE150 chemistry. Distribution of sequencing quality along reads and sequencing error rate were evaluated, low-quality reads and adaptors were filtered using Fastp (v.0.20.0) with parameters -g -q 30 -u 50 -n 15 -l 150. The OtoMan_p1.0 genome (GCA_028564725.2) was used as reference. The filtered sequencing data were mapped to the reference sequence through BWA (Li et al., 2009a) software (parameters: mem -t 4 -k 32 -M). Resulting alignments were sorted using SAMtools (v1.13) with parameters sort -@ 6 -m 2G and merged for each sample using Picard (v1.111). Single nucleotide polymorphisms (SNP) and indels were called for entire cohort (joint calling) using Haplotypecaller from GATK (v4.0.5.1) (DePristo et al., 2011) with the following parameters --pair-hmm-gap-continuation-penalty 10 -ERC GVCF --genotyping-mode DISCOVERY -stand-call-conf 30. Polymorphisms detected were annotated using ANNOVAR (v2015Dec14) (Wang et al., 2010) and their characteristics (e.g. quality, total numbers and distribution in different genomic regions) were evaluated. The original annotation record for reference genome was kindly provided by the authors (Flack et al. 2023). As for structural variants (SVs), BreakDancer (v1.4.4) (Chen et al., 2009) software was used with default parameters to detect indels, inversions, intra-chromosomal translocations and inter-chromosomal translocations. The SVs detected were filtered by removing those with less than 2 supporting reads; indels and inversions were further annotated by ANNOVAR. Characteristics of SVs such as their total numbers, distribution across genome and length were assessed. Based on the genome reads depth, CNVnator (v0.3) (Abyzov et al., 2011) was used to detect Copy Number Variants (CNVs) of potential deletions and duplications with the following parameter -call 100. The CNVs detected were further annotated by ANNOVAR and their characteristics determined. The distribution of all types of variants across the whole genomes were visualized by Circos (Krzywinski et al., 2009). The genomic data were submitted to the NCBI under the BioProject ID PRJNA1098449. The individual BioSample IDs for each genome are SAMN40907891, SAMN40907892, SAMN40907893 and SAMN40907894.

创建时间：

2024-10-16