Optimizing exome captures in species with large genomes using species-specific repetitive DNA blocker

NIAID Data Ecosystem2026-05-02 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.qfttdz0rw

下载链接

链接失效反馈

官方服务：

资源简介：

Large and highly repetitive genomes are common. However, research interests usually lie within the non-repetitive parts of the genome, as they are more likely functional, and can be used to answer questions related to adaptation, selection, and evolutionary history. Exome capture is a cost-effective method for providing sequencing data from protein-coding parts of the genes. C0t-1 DNA blockers consist of repetitive DNA and are used in exome captures to prevent the hybridization of repetitive DNA sequences to capture baits or bait-bound genomic DNA. Universal blockers target repetitive regions shared by many species, while species-specific c0t-1 DNA is prepared from the DNA of the studied species, thus perfectly matching the repetitive DNA contents of the species. So far the use of species-specific c0t-1 DNA has been limited to a few model species. Here, we evaluated the performance of blocker treatments in exome captures of Pinus sylvestris, a widely distributed conifer species with a large (> 20 Gbp) and highly repetitive genome. We compared treatment with a commercial universal blocker to treatments with species-specific c0t-1 (30,000 ng and 60,000 ng). Species-specific c0t-1 captured more unique exons than the initial set of targets leading to increased SNP discovery and reduced sequencing of tandem repeats compared to the universal blocker. Based on our results, we recommend optimizing exome captures by using at least 60,000 ng species-specific c0t-1 DNA. It is relatively easy and fast to prepare and can also be used with existing bait set designs. Methods The original Pinus tabuliformis reference genome (v1.0; Niu et al., 2022) was masked to increase mapping to the genome by correcting problems in the genome polishing, as many identical sequences were found at the ends of different chromosomes. To construct a masked version of the reference genome, chromosomes were first split back into contigs. Contigs were then aligned within chromosomes and between unplaced contigs using Minimap2 (Li, 2018). Alignments were then chained to longer ones using the ChainPaf module of Lep-Anchor (Rastas, 2020). Half of the aligning regions of > 10 kb were masked by masking the region in shorter of the two contigs involved in the alignment.

创建时间：

2024-11-15