Data for: The role of repetitive DNA in re-patterning of major rDNA clusters in Lepidoptera

NIAID Data Ecosystem2026-05-01 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.gmsbcc2qj

下载链接

链接失效反馈

官方服务：

资源简介：

Genes for major ribosomal RNAs (rDNA) are present in multiple copies organized in tandem arrays. Number and position of rDNA loci can change dynamically and their re-patterning is presumably driven by repetitive sequences. We explored a peculiar rDNA organization in several representatives of Lepidoptera with either extremely large or numerous rDNA clusters. We combined molecular cytogenetics with analyses of second and third generation sequencing data to show that rDNA spreads as a transcription unit and reveal association between rDNA and various repeats. Furthermore, we performed comparative long read analyses between the species with derived rDNA distribution and moths with a single rDNA locus, which is considered ancestral. Our results suggest that satellite arrays, rather than mobile elements, facilitate homology-mediated spread of rDNA via either integration of extrachromosomal rDNA circles or ectopic recombination. The latter arguably better explains preferential spread of rDNA into terminal regions of lepidopteran chromosomes as efficiency of ectopic recombination depends on proximity of homologous sequences to telomeres. Methods Repeat Explorer analysis For analysis of repetitive DNA content, whole gDNA was sequenced on the Illumina platform generating either 150 bp pair-end reads from library with mean insert size 450 bp (Novogene Co., Ltd., Beijing, China) or 250 bp PE reads with the mean insert size 700 bp in case of C. ohridella (Genomics Core Facility, EMBL Heidelberg, Germany). The raw reads were quality filtered and trimmed to uniform length of 120 bp (230 bp for C. ohridella) by Trimmomatic 3.2 (Bolger et al., 2014). Random sample of two million (one million for C. ohridella) trimmed PE reads was analysed by RE pipeline (version cerit-v0.3.1-2706) implemented in Galaxy environment (https://repeatexplorer-elixir.cerit-sc.cz/galaxy/) with automatic annotation via blastn and blastx using the Metazoan 3 Repeat Explorer database. The resulting html files were searched for clusters annotated as major rDNA and their connection to other clusters. Long read sequencing and analysis High molecular weight DNA from H. humuli was enriched for fragments longer than 10 kbp by Short Read Eliminator (Circulomics Inc). The library was prepared by Ligation Sequencing Kit SQK-LSK110 (Oxford Nanopore Technologies, Oxford, UK) according to the manufacture’s protocol and therein recommended third party consumables. The library was snap-frozen and stored over night at -70°C and then sequenced using flowcell R10.3 and MinION Mk1B (Oxford Nanopore Technologies). Reads were basecalled by guppy 4.4.1. with high accuracy flip-flop algorithm. The data was filtered for reads 15kbp and longer with quality score over 10 using NanoFilt (De Coster et al., 2018). Quality and length filtered reads were searched for presence of major rDNA using blastn. Reads containing at least 1000 bp of H. humuli major rDNA unit were assembled by Flye 2.8 (Kolmogorov et al., 2019) using minimal overlap 8 kbp. The annotation of MEs was done by RepeatMasker 4.1.2-p1 (Smit et al., 2013) protein-based masking. Tandem repeats were identified based on self Dotplot implemented in Geneious 11.1.5. Consensus sequences of all identified ME fragments together with major rDNA unit were mapped to individual rDNA bearing nanopore reads using minimap2 (Li, 2018) with appropriate pre-set. The presence and relative localization of individual elements was evaluated via R script (R version 4.0.3 in Rstudio version 1.4.1103) , only regions with mapping quality at least 20 were considered. Phymatopus californicus gDNA was sequenced on Oxford Nanopore platform in Novogene Co.,Ltd. PacBio HiFi reads of I. io (project PRJEB42130) and A. urticae (project PRJEB42112) were obtain through the Darwin Tree of Life project (http://www.darwintreeoflife.org). PacBio CLR data were obtained from Sequence Read Archive (SRA) database (S. frugiperda SRR12642577; L. dispar SRR13505170-6, SRR13505182-3, and SRR13505187; P.xylostella SRR13530960). Further, the reads were processed same as in H. humuli except for the HiFi reads, which were not quality filtered. Similar approach to detect rDNA and associated repetitive DNA was used also in A. urticae chromosomal level genome assembly (Bishop et al., 2021) (ENA acc. No. PRJEB41896). Coverage analysis Coverage analysis was done by aligning genomic Illumina sequencing reads from H. humuli I. io, and A. urticae to consensus sequences, which were generated by overlapping the contigs from RE in Geneious 11.1.5 or by Flye 2.8 assembler, using Bowtie2 aligner (Langmead et al., 2019; Langmead and Salzberg, 2012). Coverage values were obtained using samtools depth (v 1.10) (Li et al., 2009) and plotted using a script in R (R version 4.1.0 in Rstudio Workbench Version 1.4.1717-3). Mean coverage of defined annotation blocks as seen in Figure 3 was computed using R and is in Suppl. Tables 3.

创建时间：

2023-05-19