Turnover of retroelements and satellite DNA drives centromere reorganization over short evolutionary timescales in Drosophila

NIAID Data Ecosystem2026-05-02 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.1zcrjdg2g

下载链接

链接失效反馈

官方服务：

资源简介：

Centromeres reside in rapidly evolving, repeat-rich genomic regions, despite their essential function in chromosome segregation. Across organisms, centromeres are rich in selfish genetic elements such as transposable elements and satellite DNAs that can bias their transmission through meiosis. However, these elements still need to cooperate at some level and contribute to, or avoid interfering with, centromere function. To gain insight into the balance between conflict and cooperation at centromeric DNA, we take advantage of the close evolutionary relationships within the Drosophila simulans clade – D. simulans, D. sechellia, and D. mauritiana – and their relative, D. melanogaster. Using chromatin profiling combined with high resolution fluorescence in situ hybridization on stretched DNA, we characterize all centromeres across these species. We discovered dramatic centromere reorganization involving recurrent shifts between retroelements and satellite DNAs over short evolutionary timescales. We also reveal the recent origin (<240 Kya) of telocentric chromosomes in D. sechellia, where the X and 4th centromeres now sit on telomere-specific retroelements. Finally, the Y chromosome centromeres, which are the only chromosomes that do not experience female meiosis, do not show dynamic cycling between satDNA and TEs. The patterns of rapid centromere turnover in these species are consistent with genetic conflicts in the female germline. and have implications for centromeric DNA function and karyotype evolution. Regardless of the evolutionary forces driving this turnover, the rapid reorganization of centromeric sequences over short evolutionary timescales highlights their potential as hotspots for evolutionary innovation. Methods Overview: This repository contains data and code used in Courret et al. 2024 (https://doi.org/10.1371/journal.pbio.3002911). CUT&Tag methods: We performed CUT&Tag using around 100,000 nuclei per sample. We used the pA-Tn5 enzyme from Epicypher and followed the manufacturer's protocol (CUT&Tag Protocol v1.5). For each species we performed 3 replicates with the anti-CID20 antibody (1:50), one positive control using anti-H3K9me3 (1:100), and one negative control using the anti-IgG antibody (1:100). For the library preparation, we used the primers in S8 Table of Courret et al. 2024. We analyzed each library on Bioanalyzer for quality control, representative profiles of CENP-A and H3K27me3 profiles are provided in S11B Fig. Before final sequencing, we pooled 2µl of each library and performed a MiSeq run. We used the number of resulting reads from each library to estimate the relative concentration of each library and ensure an equal representation of each library in the final pool for sequencing. We sequenced the libraries in 150-bp paired-end mode on HiSeq Illumina. We obtained around 10 million reads per library, except for the IgG negative control, which usually has a lower representation (S9 Table). G2/Jockey-3 evolutionary analyses We identified G2/Jockey-3 sequences with two complementary methods. First, we annotated each genome assembly with our custom Drosophila TE library including the D. melanogaster G2/Jockey-3 consensus sequence using Repeatmasker v4.1.0. The annotations and 500 bp flanking regions were extracted with BEDTools v2.29.0 and aligned with MAFFT to generate a species-specific consensus sequence with Geneious v.8.1.6. Each assembly was annotated again using Repeatmasker with the appropriate species-specific G2/Jockey-3 consensus sequence. Second, we constructed de novo repeat libraries for each species with RepeatModeler2 v.2.0.1 and identified candidate G2/Jockey-3 sequences which shared high similarity with G2/Jockey-3 in D. melanogaster identified with BLAST v.2.10.0. We did the same with Jockey-1 (LINEJ1_DM) as confirmation of our methods, and to use it as an outgroup for the TE fragment alignment. We removed candidates shorter than 100 bp from the analysis. We identified ORFs within consensus TE sequences with NCBI ORFfinder. We used Repeatmasker to annotate the genome assemblies with the de novo Jockey-3 consensus sequences. To infer a phylogenetic tree of TEs, we aligned G2/Jockey-3 fragments identified in each species with MAFFT and retained sequences corresponding to the ORF bounds of the consensus sequences; We removed ORF fragments <400 bp. We inferred the tree with RAxML v.8.2.11 using the command “raxmlHPC-PTHREADS -s alignment_Jockey-3_melsimyak_400_ORF2_mafft.fasta -m GTRGAMMA -T 24 -d -p 12345 -# autoMRE -k -x 12345 -f a”.

创建时间：

2024-11-22